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Introduction 


Inverse  planning  is  at  the  heart  of  prostate  Volumetric  Modulated  Arc  Therapy  (VMAT)  treatment  procedure 
and  critically  determines  its  level  of  success.  As  practiced  now,  the  capacity  of  VMAT  is  greatly  underutilized 
because  of  inferior  computing  performance  of  existing  optimization  methods.  An  alternative  mathematical 
approach  that  improves  both  the  efficiency  and  the  efficacy  is  needed  and  is  the  center  of  this  research. 
We  propose  to  develop  a  new  innovative  inverse  planning  tool,  based  on  the  novel  idea  of  superiorization, 
to  replace  the  classical  constrained  optimization  approaches  employed  in  clinics  today  for  prostate  VMAT 
cases. 

Towards  this  goal,  year  1  of  the  training  award  focused  on  formulating  the  VMAT  problem  as  a  con¬ 
strained  superiorization  problem  and  on  the  development  of  a  framework  of  fast  converging  inverse  planning 
algorithms.  The  new  approach  was  then  implemented,  tested  and  evaluated  on  a  previously  treated  prostate 
cancer  case  where  initial  results  were  obtained. 


Body 

1.  Research  Accomplishments 

SOW  Aim  1:  Develop  algorithms  for  inverse  planning  using  superiorization  techniques 
for  prostate  VMAT 


Meeting  the  goal  outlined  in  the  SOW  aim  1,  we  have  studied  the  problem  of  inverse  planning  for  prostate 
VMAT  and  developed  a  framework  of  algorithms  using  the  superiorization  methodology  that  is  specifically 
tailored  to  this  application.  We  first  defined  the  problem  mathematically  by  reformulating  it  as  a  linear 
feasibility  problem  (instead  of  a  minimization  problem)  and  suggested  a  solution  to  solve  it  using  the  su¬ 
periorization  methodology.  In  developing  the  tools,  we  have  also  generalized  the  approach  to  include  other 
medical  physics  applications,  and  provided  conditions  that  are  simple  to  meet  both  in  theory  and  in  practice. 
Our  claims  were  proved  mathematically  and  the  results  were  submitted  two  journal  (archival)  publications 
[2,4]. 


Task  1:  Formulating  the  VMAT  treatment  planning  as  a  constrained  superiorization  problem 


Our  approach  to  a  VMAT  treatment  planning  started  by  studying  the  current  mathematical  models  used 
for  this  application.  Since  the  superiorization  methodology  requires  a  different  mathematical  formulation, 
the  first  step  was  to  model  the  problem  accordingly. 

Consider  the  system  of  equations 

Ax  =  d,  (1) 


where  A  is  the  J  x  I  dose  matrix  that  maps  any  intensity  of  beamlets  vector  x  =  (xi)l=1  £  R1  onto  a  dose  in 
voxels  vector  d  =  (dj)j=1  £  RJ .  Here  /  is  the  total  number  of  beamlets  and  J  is  the  total  number  of  voxels. 
The  minimization  problem  can  then  be  formulated  as 


minimize  y)f=1  A s  ||Asa:  —  d(s) 
subject  to  x  >  0, 


(2) 


where  the  index  s  stands  for  different  structures,  As  is  the  submatrix  of  A  related  to  structure  s  and  d(s)  is 
the  subvector  of  d  related  to  structure  s,  respectively,  and  As  is  the  importance  factor  associated  with  the 
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sth  structure  which  is  decided  by  the  planner.  There  is  an  assumption  that  x  is  achievable  using  apertures 
(aperture  constraints). 

Assume  that  we  have  S  structures,  for  s  =  1,  2, . . . ,  S,  (including  the  complement  of  all  identified  struc¬ 
tures),  and  denote  by  Os  the  set  of  indices  of  voxels  that  belong  to  the  sth  structure,  such  that 


Os  —  {is, 1,  is, 2,  •  •  -is,m(s)} 


(3) 


where  ?n(s)  is  the  number  of  voxels  in  this  structure, 
blocks 

A  = 


Then  the  system  matrix  A  can  be  partitioned  into 

1 


so  that  a  submatrix  As  will  contain  the  rows  from  A  whose  indices  appear  in  Os,  (similarly,  let  drs\  be  the 
subvector  of  d  whose  component  indices  appear  in  Os )  and  then  the  system  becomes 


'  A,  - 

(  dA)  \ 

A-2 

d(2) 

X  = 

As 

\  d(S)  ) 

An  optimization  method  aims  at  satisfying  the  system  (1)  (equivalently  (5))  while  minimizing  a  given  ob¬ 
jective  function. 


Reformulating  the  problem  as  a  constrained  superiorization  problem:  We  suggest  the  follow¬ 
ing  modifications  to  the  above  modality.  Replace  the  prescription  method  that  gives  rise  to  the  system 
Ax  =  d  in  (1)  by  a  more  flexible  one  in  which  we  ask  the  planner  to  provide  lower-  and  upper-  dose  bounds 
vectors,  d  and  d.  respectively,  on  all  voxels  in  all  structures,  and  instead  of  (2)  we  aim  at  solving  the  following 
linear  feasibility  problem 

d  <  Ax  <  d.  (6) 

By  transforming  the  problem  of  (1)  into  a  linear  feasibility  problem  of  the  form  (6),  we  allow  many  iterative 
projection  method  to  derive  a  solution.  This  enables  a  formulation  for  the  superiorization  methodology  to  be 
applied  to  VMAT  inverse  planning  problem  since  many  of  these  algorithms  are  also  perturbation  resilient. 
Specifically,  methods  that  belong  to  the  two  classes  of  projection  methods,  String  Averaging  Projection 
(SAP)  and  Block-Iterative  Projection  (BIP)  methods,  can  be  applied  towards  solving  this  formulation  and 
achieve  finding  a  superior  solution  in  addition  to  satisfying  the  feasibility  constraints  (see  [1,  3]).  That  is,  an 
x  obtainable  by  a  projection  method  alone  will  be  an  intensity  of  beamlets  vector  trying  to  solve  (6),  while 
using  a  projection  method  that  is  also  perturbation  resilient  allows  for  obtaining  an  x  that  solves  (6)  but 
also  provides  a  solution  that  is  superior  with  respect  to  an  objective  function. 

The  solution  vector  x  of  the  beamlet  intensities  that  results  from  the  superiorization  approach  will  then 
be  evaluated.  Tools  such  as  dose  volume  histograms  (DVHs)  will  help  assess  conformality  to  the  prostate 
(the  target)  and  to  the  organs  at  risk  (OAR).  These  will  be  compared  against  what  is  recommended  by 
a  physician  in  the  clinic  and  governed  by  the  specifications  of  the  Radiation  Treatment  Oncology  Group 
(RTOG)  protocol  for  prostate  cancer  patients  [5]. 
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The  adaptation  to  our  model  based  on  the  RTOG  protocol  is  as  follows:  Given  a  structure  s  that  is  an 
OAR,  we  define  to  be  the  upper-bound  subvector  of  the  prescribed  dose 

d(s)  =  ^(s)»  (7) 

and  define  d(s)  to  be  a  lower-bound  subvector  for  any  target  structure  s 

d(s)  =  d(ay  (8) 

This  allows  the  constraints  in  (6)  to  be  written  as 

0  <  Asx  <  d(s),  (9) 

for  an  OAR  structure  s  and  as 

d(a)  <  Asx  <  e(s),  (10) 

for  a  target  structure  s,  where  is  a  clinically-specified  upper-bound  subvector  for  the  target. 

In  assessing  the  solution  provided  by  the  superiorization  method,  if  the  acceptance  criteria  is  not  met, 
then  a  refined  selection  of  d  and  d  will  be  provided  and  the  process  will  repeat  until  a  superior  feasible 
solution  is  found  (this  step  is  identical  to  how  it  is  done  in  the  clinic  today). 


Task  2:  Development  of  a  framework  for  fast  converging  inverse  planning  superiorization  techniques 
Task  7:  Investigate  the  underlying  principles  and  put  their  concept  on  a  firm  mathematical  ground 

In  developing  a  framework  for  fast  converging  inverse  planning  superiorization  techniques  we  first  identified 
several  problems  that  currently  exist  in  optimization  methods.  In  classical  optimization  it  is  assumed  that 
there  is  a  constraints  set  C  and  the  task  is  to  find  an  a :  £  C  for  which  <j>(x)  is  minimal.  Problems  with 
this  approach  are  the  following:  (1)  The  constraints  may  not  be  consistent  and  so  C  could  be  empty  and 
the  optimization  task  as  stated  would  not  have  a  solution.  (2)  Iterative  methods  of  classical  constrained 
optimization  typically  converge  to  a  solution  only  in  the  limit.  In  practice  some  stopping  rule  is  applied  to 
terminate  the  process  and  the  actual  output  at  that  time  may  not  be  in  C  and,  even  if  it  is  in  C,  it  is  most 
unlikely  to  be  a  minimizer  of  <j>  over  C . 

Both  problems  were  addressed  in  the  newly  developed  superiorization  framework.  Mathematical  defini¬ 
tions  and  conditions  were  introduced  and  were  theoretically  proven.  The  new  foundations  include  two  new 
notions  of  constraints- compatibility  and  strong  perturbation  resiliency.  The  new  concepts  allow  to  take  into 
the  modality  the  infeasibility  and  practical  convergence  problems  that  exist  in  optimization  methods.  More 
specifically,  in  the  superiorization  model  we  suggested  to  replace  the  constraints  set  C  by  a  nonnegative 
real-valued  function  Pr  that  serves  as  an  indicator  of  how  incompatible  a  given  x  is  with  the  constraints. 
Then  the  merit  of  an  actual  output  of  an  algorithm  is  given  by  the  smallness  of  the  two  numbers  Pr(x)  and 
4>{x).  Roughly,  if  an  iterative  algorithm  produces  an  output  x,  then  its  superiorized  version  will  produce  an 
output  x'  for  which  Pr(x')  is  not  larger  then  Pr{x),  but  (in  general)  4>(x')  is  much  smaller  than  <t>(x). 

In  addition  to  the  theoretical  developments  of  superiorization,  a  practical  and  systematic  way  was  de¬ 
veloped  to  turn  any  iterative  algorithm  that  solves  a  feasibility  problem  into  an  algorithm  that  does  supe¬ 
riorization.  For  an  iterative  algorithm  P  and  for  any  optimization  criterion  (j>  for  which  we  know  how  to 
produce  nonascending  vectors  (see  definition  p.  5536  in  [4]),  the  following  pseudocode  automatically  takes 
P  and  produces  a  version  of  P  that  is  superiorized  for  <f>  (exact  details  of  the  procedure  can  be  found  on 
page  5537  in  [4]): 
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Superiorized  Version  of  the  Basic  Algorithm 
1.  set  k  =  0 


2.  set  yk  =  y° 

3.  set  £  =  —  1 

4.  repeat 

5.  set  n  =  0 

6.  set  yk’n  =  yk 

7.  while  n<N 

8.  set  vk,n  to  be  a  nonascending  vector  for  at  yk’n 

9.  set  loop=true 

10.  while  loop 

11.  set  £  =  £  +  1 

12.  set  /3fe>n  =  rje 

13.  set  x  =  yk’n  +  /3k,nvk’n 

14.  if  <j>  (z)<4>  (yk)  then 

15.  set  n=n  +  1 

16.  set  yk’n=z 

17.  set  loop  =  false 

18.  set  yk+1=Ac  ( yk,N ) 

19.  set  k  =  k  +  1 

By  bridging  the  gap  that  typically  exist  between  theory  and  practice  in  the  new  model,  superiorization  was 
made  more  general.  That  is,  the  framework  fit  many  other  medical  physics  application,  not  just  VMAT 
or  radiation  therapy  inverse  planning  type  applications.  All  the  results  mentioned  briefly  here  have  been 
published  in  an  archival  journal  publication  in  the  journal  of  Medical  Physics  [4],  see  the  Appendix  Section 
for  the  full  manuscript. 

Another  accomplishment  related  to  this  task  touches  on  an  additional  aspect  of  superiorization.  Con¬ 
strained  optimization  problems  that  arise  in  real-life  applications  are  often  huge  (such  an  example  is  the 
VMAT  problem).  It  can  then  happen  that  the  traditional  algorithms  for  constrained  optimization  require 
computational  resources  that  are  not  easily  available  and,  even  if  they  are,  the  length  of  time  needed  to 
produce  an  acceptable  output  is  too  long  to  be  practicable.  As  part  of  our  goal  to  show  that  superioriza¬ 
tion  can  handle  large  size  problems  efficiently,  we  have  illustrated  that  the  computational  requirements  of  a 
superiorized  algorithm  can  be  significantly  less  than  that  of  a  traditional  optimization  algorithm,  by  report¬ 
ing  on  a  comparison  of  superiorization  with  the  projected  subgradient  method  (PSM),  which  is  a  standard 
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T otal  V ariation  value 

Time  (seconds) 

projected  subgradient  method 

919 

2217 

superiorization  method 

873 

102 

Table  1:  Performance  comparison  of  the  projected  subgradient  method  and  the  superiorization  method  with 
Total  Variation  as  the  objective  function. 


method  of  classical  optimization.  Table  1  summarizes  the  comparison  we  have  performed  between  the  PSM 
method  and  the  superiorization  method.  In  our  experiment,  we  set  the  the  stopping  rule  to  guarantee  that 
the  output  of  the  superiorization  method  is  at  least  as  constraints-compatible  as  the  output  of  the  PSM. 
The  superiorization  method  showed  clearly  superior  efficacy  to  the  PSM:  it  obtained  a  result  with  a  lower 
objective  function  value  (TV)  at  less  than  one  twentieth  of  the  computational  cost. 

The  complete  report  that  summarizes  this  work  was  submitted  to  the  Journal  of  Optimization  Theory 
and  Applications  and  is  currently  under  review  [2].  It  is  attached  to  this  report  in  the  Appendix  Section. 


Task  3:  Implementation  and  testing  of  the  developed  algorithms 

In  this  task  we  wanted  to  assess  our  proposed  approach  to  using  superiorization  on  a  realistically  yet  simple 
test  case.  The  goal  set  here  is  two-fold:  the  first  is  to  show  that  the  developed  method  can  produce  good  re¬ 
sults  and  the  second  is  to  obtain  a  clear  indication  if  the  nonacsending-type  superiorization  techniques  should 
be  replaced  with  alternative  derivative-free  approaches  (see,  SOW  Task  4:  the  development  of  alternative 
derivative- free  techniques  to  superiorization). 

We  proposed  to  use  as  a  test  case  in  this  task  a  previously  treated  intensity  modulated  radiation  therapy 
(IMRT)  prostate  patient  case.  As  was  explained  in  the  research  proposal,  the  VMAT  technique  delivers 
an  IMRT  type  treatment  in  a  single  arc.  Getting  good  results  on  a  previously  treated  IMRT  case  would 
establish  a  level  of  confidence  that  the  superiorization  method  can  deliver  superior  results  by  referencing  a 
previously  treated  clinical  case.  The  modality  that  was  given  above  (in  Task  1)  is  identical  for  these  two 
radiation  therapy  techniques  (i.e. ,  IMRT  and  VMAT);  the  difference  lies  in  the  size  of  the  problem  and 
its  level  of  complexity.  Since  superiorization  was  never  tried  with  any  type  of  radiation  therapy  treatment 
planning  it  is  important  to  provide  such  evidence  on  an  actual  clinical  case.  The  mathematical  model  which 
we  have  developed  along  with  the  theories  and  proofs  of  the  superiorization  methodology  (in  Task  2)  fit  both 
problems.  Satisfactory  results  will  encourage  us  to  continue  develop  the  method  as  it  is  proposed  in  Tasks 
1  and  2  and  tailor  it  further  more  to  the  VMAT  approach. 

Algorithmic  operator  and  objective  function  The  framework  that  was  developed  is  quite  general  for 
many  medical  physics  applications.  With  the  modality  of  the  superiorization  approach  in  (6),  a  choice  for 
a  projection  operator  that  is  perturbation  resilient  is  needed  as  well  as  a  choice  of  an  objective  function. 
The  algorithmic  operator  that  was  chosen  for  our  implementation  was  the  Algebraic  Reconstruction  Tech¬ 
nique  (ART)  for  inequalities  constraints.  This  operator  was  proven  to  be  perturbation-resilient  in  [1],  The 
constraints  of  the  system  in  (6)  can  be  thought  of  as  hyperslabs.  The  algorithm  projects  the  current  point 
according  to  its  location  in  relation  to  the  two  hyperplanes  that  form  a  hyperslab.  A  geometrical  descrip¬ 
tion  to  this  feasibility  problem  is  provided  in  Figure  1.  The  analytical  formulation  associated  with  it  is  the 


case  C 


Figure  1:  Geometrical  description  of  ART  with  inequalities  constraints 


following: 

!xk,  if  c,-  <  (a®,  xk)  <  di  (case  A), 

xk  +  if  di<(a\xk)  (case  B),  (11) 

xk  +  XkC'  ii^h^  ^  a\  if  (a\xk)  <  Ci  (case  C). 

The  objective  function  used  in  our  implementation  was  the  total  variation  (TV)  functional  of  the  beamlet 
intensity  vector  x ,  see  Eq.  (12)  in  [4]  and  the  discussion  in  the  research  proposal  under  Specific  Aim  1 
regarding  this  choice.  We  denote  herein  the  superiorization  algorithm  that  uses  TV  as  the  objective  function 
by  TV-Superiorization. 

Prostate  patient  data  and  planning  The  data  for  testing  the  approach  were  of  a  previously  treated 
prostate  cancer  patient.  A  seven  field  radiation  treatment  IMRT  plan  was  created.  The  organs  that  were 
included  in  the  plan  were  the  prostate  (target),  rectum,  bladder,  small  bowel  (OARs)  and  the  full  body. 
Figure  2  shows  the  CT  and  the  contours  of  these  structures.  Using  RTOG  0815  [5]  we  set  in  Table  2  the 
acceptance  criteria  for  the  implemented  TV-Superiorization  algorithm. 

Results  We  compared  the  results  when  superiorization  was  present  versus  when  it  was  not.  The  TV- 
Superiorization  algorithm  was  able  to  meet  the  RTOG  acceptance  criteria  while  the  one  without  TV- 
Superiorization  was  not.  Moreover,  the  TV-Superiorization  algorithm  was  able  to  achieve  this  in  a  relatively 
short  amount  time  of  only  12  iterations.  Figure  3  shows  the  DVH  curves  of  the  two  algorithms  side-by-side. 
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Figure  2:  CT  of  the  prostate  patient  case  used  in  the  experiment. 


Organ 

Target? 

Acceptance  criteria 

I.  Prostate 

Yes 

1.  Dose  will  be  normalized  s.t.  98%  of  the  PTV  receives  the 
prescription  dose.  (Prescribed  dose  to  PTV  is  79.2  Gy.) 

2.  The  maximum  allowable  dose  within  the  PTV  is  107% 

of  the  prescribed  dose  (i.e. ,  maximum  allowed  dose  is  84.744  Gy). 

3.  The  minimum  allowable  dose  within  the  PTV  should  be  >95% 
of  the  prescribed  dose  (i.e.,  100%  of  the  dose  should  be  >75.24  Gy. 

II.  Rectum 

No 

1.  No  more  than  15%  volume  receives  dose  that  exceeds  75  Gy 

2.  No  more  than  25%  volume  receives  dose  that  exceeds  70  Gy 

3.  No  more  than  35%  volume  receives  dose  that  exceeds  65  Gy 

4.  No  more  than  50%  volume  receives  dose  that  exceeds  60  Gy 

III.  Bladder 

No 

1.  No  more  than  15%  volume  receives  dose  that  exceeds  80  Gy 

2.  No  more  than  25%  volume  receives  dose  that  exceeds  75  Gy 

3.  No  more  than  35%  volume  receives  dose  that  exceeds  70  Gy 

4.  No  more  than  50%  volume  receives  dose  that  exceeds  65  Gy 

IV.  Small  Bowel 

No 

1.  Upper  bound  is  set  to  52  Gy. 

Table  2:  Acceptance  criteria  for  prostate  patients  according  to  RTOG  0815  [5]. 
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DVH  TV-Superiorization  I: 

With  [solid-lines],  Without  [dashed-lines,  2], 
Both  after  1 2  iterations 
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Figure  3:  DVH  plots  for  a  prostate  case  experiment  with  and  without  TV-Superiorization. 


The  solid  lines  represent  the  TV-Superiorization  algorithm  and  the  dashed  lines  represent  the  algorithm 
without  Superiorization.  The  corresponding  numbers  for  assessing  the  acceptance  criteria  are  specified  in 
Table  3.  As  can  be  seen,  the  criteria  that  is  based  on  the  RTOG  protocol  [5]  was  fully  met  by  the  superior¬ 
ization  method  for  this  prostate  case. 


Training  Accomplishments 

Task  8:  Seminar,  lectures  and  meetings 
Task  9:  Research  training 
Task  10:  Clinical  training 

During  year  1  of  the  training  award  the  PI  had  attended  regular  meetings,  seminars  and  journal  clubs  with 
presentations  on  topics  related  to  radiation  therapy  treatment  planning.  Other  presentations  of  visiting 
scholars  and  professionals  were  also  available  throughout  the  year  and  had  enriched  his  knowledge  on  the 
topic.  The  PI  was  trained  in  the  clinic  to  operate  the  Eclipse  system  stations  for  treatment  planning  available 
at  Stanford  Cancer  Center  (Eclipse  is  a  commercial  tool  for  treatment  planning  developed  by  Varian  Medical 
Systems).  He  collaborated  with  radiation  oncologists,  radiation  therapists,  physicists  and  dosimetrists  and 
obtained  first-hand  the  knowledge  and  experience  of  the  process  of  prostate  radiation  treatment  planning. 
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Organ 

Target? 

Criterion 

TV-Superiorization 

I.  Prostate 

Yes 

%vol  >  79.2  Gy  =  98 

%vol  >  79.2  Gy  =  98 

%vol  >  84.744  Gy  =  0 

%vol  >  84.6  Gy  =  0 

%vol  >  75.24  Gy  =  100 

%vol  >  75.24  Gy  =  100 

II.  Rectum 

No 

%vol  >  75  Gy  <  15 

%vol  >  75  Gy  <  12.7 

%vol  >  70  Gy  <  25 

%vol  >  70  Gy  <  18.6 

%vol  >  65  Gy  <  35 

%vol  >  65  Gy  <  25.8 

%vol  >  60  Gy  <  50 

%vol  >  60  Gy  <  34.5 

III.  Bladder 

No 

%vol  >  80  Gy  <  15 

%vol  >  80  Gy  <  2.2 

%vol  >  75  Gy  <  25 

%vol  >  75  Gy  <  4.9 

%vol  >  70  Gy  <  35 

%vol  >  70  Gy  <  6.8 

%vol  >  65  Gy  <  50 

%vol  >  65  Gy  <  8.7 

IV.  Small  Bowel 

No 

%vol  >  52  Gy  <  0 

%vol  >  1.4  Gy  <  0 

Table  3:  Results  of  the  criteria  for  the  TV-Superiorization  algorithm. 


Key  Research  Accomplishments 

•  Formulated  the  VMAT  treatment  planning  as  a  constrained  superiorization  problem. 

•  Developed  a  framework  for  fast  converging  inverse  planning  superiorization  techniques. 

•  Derived  the  necessary  conditions  of  the  superiorization  framework  for  VMAT  treatment  planning 

•  Placed  the  newly  developed  concepts  on  a  firm  mathematical  ground. 

•  Implemented  and  tested  the  new  superiorization  framework  and  showed  good  initial  results. 

•  Trained  in  treating  prostate  cancer  as  it  is  done  in  the  clinic  today. 


Reportable  Outcomes 

•  Two  journal  publications  were  submitted.  One  appeared  in  the  journal  of  Medical  Physics  and  another 
is  currently  under  review: 

1.  G.T.  Herman,  E.  Garduno,  R.  Davidi  and  Y.  Censor,  Superiorization:  An  optimization  heuristic 
for  medical  physics,  Medical  Physics  39  (2012),  5532-5546. 

2.  Y.  Censor,  R.  Davidi,  G.T.  Herman,  R.W.  Schulte  and  L.  Tetruashvili,  Projected  subgradient 
minimization  versus  superiorization,  Journal  of  Optimization  Theory  and  Applications  (submit¬ 
ted). 

•  The  above  work  has  been  accepted  for  presentation  at  the  joint  workshop  sponsored  by  the  American 
Society  for  Therapeutic  Radiology  and  Oncology  (ASTRO),  the  National  Cancer  Institute  (NCI)  and 
the  American  Association  of  Physicists  in  Medicine  (AAPM)  ,  June  13-14,  2013,  National  Institutes  of 
Health,  Bethesda,  MD,  USA. 
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•  The  above  work  has  been  accepted  for  presentation  at  the  workshop  on  Projection  Methods:  Theory 
and  Practice,  June  19-21,  2013,  Fraunhofer  Institute  for  Industrial  Mathematics  ITWM,  Kaiserslautern, 
Germany. 


Conclusion 

In  year  1  we  were  able  to  extend  the  superiorization  methodology  into  a  larger  framework,  one  that  is  more 
realistic  from  the  point  of  view  of  the  application  at  hand,  by  taking  into  account  the  discrepancy  that  exist 
between  theory  and  practice  and  incorporate  it  into  our  model,  we  minimized  potential  issues  that  typically 
appear  when  a  theory  is  applied  to  a  real-life  application. 

Superiorization  was  developed  to  be  a  general  tool  for  medical  physics  applications.  It  is  capable  of 
turning  any  iterative  algorithm  that  tries  to  satisfy  a  set  of  constraints  into  one  that  is  also  capable  of 
superiorizing  an  objective  function.  The  work  that  came  out  of  this  research  can  help  other  applications 
that  use  optimization  methods  as  the  main  tool. 

Using  the  above  methodology,  we  tailored  it  specifically  to  solve  the  problem  of  VMAT  in  radiation 
therapy  inverse  planning.  The  initial  results  obtained  on  a  realistic  IMRT  prostate  case  were  satisfactory 
and  show  good  indication  that  superiorization  works  and  can  be  applied  to  a  radiation  treatment  planning 
problem  such  as  IMRT  and  VMAT. 

Our  next  steps  include  the  full  implementation,  testing  and  evaluation  of  VMAT  prostate  cases.  A 
thorough  investigation  as  detailed  in  the  SOW  is  planned  in  year  2  of  this  training  award. 
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Purpose:  To  describe  and  mathematically  validate  the  superiorization  methodology,  which  is  a  re¬ 
cently  developed  heuristic  approach  to  optimization,  and  to  discuss  its  applicability  to  medical  physics 
problem  formulations  that  specify  the  desired  solution  (of  physically  given  or  otherwise  obtained  con¬ 
straints)  by  an  optimization  criterion. 

Methods:  The  superiorization  methodology  is  presented  as  a  heuristic  solver  for  a  large  class  of 
constrained  optimization  problems.  The  constraints  come  from  the  desire  to  produce  a  solution  that 
is  constraints-compatible,  in  the  sense  of  meeting  requirements  provided  by  physically  or  otherwise 
obtained  constraints.  The  underlying  idea  is  that  many  iterative  algorithms  for  finding  such  a  solution 
are  perturbation  resilient  in  the  sense  that,  even  if  certain  kinds  of  changes  are  made  at  the  end  of  each 
iterative  step,  the  algorithm  still  produces  a  constraints-compatible  solution.  This  property  is  exploited 
by  using  permitted  changes  to  steer  the  algorithm  to  a  solution  that  is  not  only  constraints-compatible, 
but  is  also  desirable  according  to  a  specified  optimization  criterion.  The  approach  is  very  general,  it 
is  applicable  to  many  iterative  procedures  and  optimization  criteria  used  in  medical  physics. 

Results:  The  main  practical  contribution  is  a  procedure  for  automatically  producing  from  any  given 
iterative  algorithm  its  superiorized  version,  which  will  supply  solutions  that  are  superior  according 
to  a  given  optimization  criterion.  It  is  shown  that  if  the  original  iterative  algorithm  satisfies  certain 
mathematical  conditions,  then  the  output  of  its  superiorized  version  is  guaranteed  to  be  as  constraints- 
compatible  as  the  output  of  the  original  algorithm,  but  it  is  superior  to  the  latter  according  to  the 
optimization  criterion.  This  intuitive  description  is  made  precise  in  the  paper  and  the  stated  claims 
are  rigorously  proved.  Superiorization  is  illustrated  on  simulated  computerized  tomography  data  of 
a  head  cross  section  and,  in  spite  of  its  generality,  superiorization  is  shown  to  be  competitive  to  an 
optimization  algorithm  that  is  specifically  designed  to  minimize  total  variation. 

Conclusions:  The  range  of  applicability  of  superiorization  to  constrained  optimization  problems  is 
very  large.  Its  major  utility  is  in  the  automatic  nature  of  producing  a  superiorization  algorithm  from  an 
algorithm  aimed  at  only  constraints-compatibility;  while  nonheuristic  (exact)  approaches  need  to  be 
redesigned  for  a  new  optimization  criterion.  Thus  superiorization  provides  a  quick  route  to  algorithms 
for  the  practical  solution  of  constrained  optimization  problems.  ©  2012  American  Association  of 
Physicists  in  Medicine.  [http://dx.doi.org/10T  1 18/1.4745566] 

Key  words:  superiorization,  constrained  optimization,  heuristic  optimization,  tomography,  total 
variation 


I.  INTRODUCTION 

Optimization  is  a  tool  that  is  used  in  many  areas  of  Medi¬ 
cal  Physics.  Prime  examples  are  radiation  therapy  treatment 
planning  and  tomographic  reconstruction,  but  there  are  others 
such  as  image  registration.  Some  well-cited  classical  publica¬ 
tions  on  the  topic  are  Refs.  1-12  and  some  recent  articles  are 
Refs.  13-26. 

In  a  typical  medical  physics  application,  one  uses  con¬ 
strained  optimization,  where  the  constraints  come  from  the 


desire  to  produce  a  solution  that  is  constraints-compatible,  in 
the  sense  of  meeting  the  requirements  provided  by  physically 
or  otherwise  obtained  constraints.  In  radiation  therapy  treat¬ 
ment  planning,  the  requirements  are  usually  in  the  form  of 
constraints  prescribed  by  the  treatment  planner  on  the  doses 
to  be  delivered  at  specific  locations  in  the  body.  These  doses 
in  turn  depend  on  information  provided  by  an  imaging  in¬ 
strument,  typically  a  magnetic  resonance  imaging  (MRI)  or 
a  computerized  tomography  (CT)  scanner.  In  tomography,  the 
constraints  come  from  the  detector  readings  of  the  instrument. 


5532  Med.  Phys.  39  (9),  September  2012 


0094-2405/201 2/39(9)/5532/1 5/$30.00 


©  2012  Am.  Assoc.  Phys.  Med.  5532 


5533 


Herman  et  al.\  Superiorization:  An  optimization  heuristic  for  medical  physics 


5533 


In  such  applications,  it  is  typically  the  case  that  a  large  num¬ 
ber  of  solutions  would  be  considered  good  enough  from  the 
point  of  view  of  being  constraints-compatible;  to  a  large  ex¬ 
tent,  but  not  entirely,  due  to  the  fact  that  there  is  uncertainty 
as  to  the  exact  nature  of  the  constraints  (for  example,  due  to 
noise  in  the  data  collection).  In  such  a  case,  an  optimization 
criterion  is  introduced  that  helps  us  to  distinguish  the  “better” 
constraints-compatible  solutions  (for  example,  this  criterion 
could  be  the  total  dose  to  be  delivered  to  the  body,  which  may 
vary  quite  a  bit  between  radiation  therapy  treatment  plans  that 
are  compatible  with  the  constraints  on  the  doses  delivered  to 
individual  locations). 

The  superiorization  methodology  (see,  for  example. 
Refs.  22  and  27-32)  is  a  recently  developed  heuristic  ap¬ 
proach  to  optimization.  The  word  heuristic  is  used  here  in 
the  sense  that  the  process  is  not  guaranteed  to  lead  to  an  op¬ 
timum  according  to  the  given  criterion;  approaches  aimed  at 
processes  that  are  guaranteed  in  that  sense  are  usually  referred 
to  as  exact.  Heuristic  approaches  have  been  found  useful  in 
practical  applications  of  optimization,  mainly  because  they 
are  often  computationally  much  less  expensive  than  their  ex¬ 
act  counterparts,  but  nevertheless  provide  solutions  that  are 
appropriate  for  the  application  at  hand.33-35 

The  underlying  idea  of  the  superiorization  approach  is  the 
following.  In  many  applications  there  exists  a  computation¬ 
ally  efficient  iterative  algorithm  that  produces  a  constraints- 
compatible  solution  for  the  given  constraints.  (An  example 
of  this  for  radiation  therapy  treatment  planning  is  reported 
in  Ref.  36,  its  clinical  use  is  discussed  in  Ref.  15.)  Fur¬ 
thermore,  often  the  algorithm  is  perturbation  resilient  in  the 
sense  that,  even  if  certain  kinds  of  changes  are  made  at 
the  end  of  each  iterative  step,  the  algorithm  still  produces 
a  constraints-compatible  solution.27-30  This  property  is  ex¬ 
ploited  in  the  superiorization  approach  by  using  such  pertur¬ 
bations  to  steer  the  algorithm  to  a  solution  that  is  not  only 
constraints-compatible,  but  is  also  desirable  according  to  a 
specified  optimization  criterion.  The  approach  is  very  general, 
it  is  applicable  to  many  iterative  procedures  and  optimization 
criteria. 

The  current  paper  presents  a  major  advance  in  the 
practice  and  theory  of  superiorization.  The  previous 
publications22'27-'-  used  the  intuitive  idea  to  present  some  su¬ 
periorization  algorithms,  in  this  paper  the  reader  will  find  a  to¬ 
tally  automatic  procedure  that  turns  an  iterative  algorithm  into 
its  superiorized  version.  This  version  will  produce  an  output 
that  is  as  constraints-compatible  as  the  output  of  the  original 
algorithm,  but  it  is  superior  to  that  according  to  an  optimiza¬ 
tion  criterion.  This  claim  is  mathematically  shown  to  be  true 
for  a  very  large  class  of  iterative  algorithms  and  for  optimiza¬ 
tion  criteria  in  general,  typical  restrictions  (such  as  convexity) 
on  the  optimization  criterion  are  not  essential  for  the  material 
presented  below.  In  order  to  make  precise  and  validate  this 
broad  claim,  we  present  here  a  new  theoretical  framework. 
The  framework  of  Ref.  29  is  a  precursor  of  what  we  present 
here,  but  it  is  a  restricted  one,  since  it  assumes  that  the  con¬ 
straints  can  be  all  satisfied  simultaneously,  which  is  often  false 
in  medical  physics  applications.  There  is  no  such  restriction 
in  the  presentation  below. 


The  idea  of  designing  algorithms  that  use  interlacing  steps 
of  two  different  kinds  (in  our  case,  one  kind  of  steps  aim  at 
constraints-compatibility  and  the  other  kind  of  steps  aim  at 
improvement  of  the  optimization  criterion)  is  well-established 
and,  in  fact,  is  made  use  of  in  many  approaches  that  have 
been  proposed  with  exact  constrained  optimization  in  mind; 
see,  for  example,  the  works  of  Helou  Neto  and  De  Fierro, 
Nurminski,39  Combettes  and  co-workers,40,41  Sidky  and  co¬ 
workers, 23,42'43  and  Defrise  and  co-workers.44  However,  none 
of  these  approaches  can  do  what  can  be  done  by  the  superi¬ 
orization  approach  as  presented  below,  namely,  the  automatic 
production  of  a  heuristic  constrained  optimization  algorithm 
from  an  iterative  algorithm  for  constraints-compatibility.  For 
example,  in  Ref.  37  it  is  assumed  (just  as  in  the  theory  pre¬ 
sented  in  our  Ref.  29)  that  all  the  constraints  can  be  satisfied 
simultaneously. 

A  major  motivator  for  the  additional  theory  presented  in 
the  current  paper  is  to  get  rid  of  this  assumption,  which 
is  not  reasonable  when  handling  real  problems  of  medical 
physics.  Motivated  by  similar  considerations,  Helou  Neto  and 
De  Fierro3'  present  an  alternative  approach  that  does  not 
require  this  unreasonable  assumption.  However,  in  order  to 
solve  such  a  problem,  they  end  up  with  iterative  algorithms 
of  a  particular  form  rather  than  having  the  generality  of  be¬ 
ing  able  to  turn  any  constraints-compatibility  seeking  algo¬ 
rithm  into  a  superiorized  one  capable  of  handling  constrained 
optimization.  Also,  the  assumptions  they  have  to  make  in 
order  to  prove  their  convergence  result  (their  Theorem  15) 
indicate  that  their  approach  is  applicable  to  a  smaller  class 
of  constrained  optimization  problems  than  the  superioriza¬ 
tion  approach  whose  applicability  seems  to  be  more  general. 
However,  for  the  mathematical  purist,  we  point  out  that  they 
present  an  exact  constrained  optimization  algorithm,  while 
superiorization  is  a  heuristic  approach.  Whether  this  is  rel¬ 
evant  to  medical  physics  practice  is  not  clear:  exact  algo¬ 
rithms  are  not  run  forever,  but  are  stopped  according  to  some 
stopping-rule,  the  relevant  questions  in  comparing  two  algo¬ 
rithms  are  the  quality  of  the  actual  output  and  the  computation 
time  needed  to  obtain  it. 

Ultimately,  the  quality  of  the  outputs  should  be  evaluated 
by  some  figures  of  merit  relevant  to  the  medical  task  at  hand. 
An  example  of  a  careful  study  of  this  kind  that  involves  su¬ 
periorization  is  in  Sec.  4.3  of  Ref.  30,  which  reports  on  com¬ 
paring  in  CT  the  efficacy  of  constrained  optimization  recon¬ 
struction  algorithms  for  the  detection  of  low-contrast  brain 
tumors  by  using  the  method  of  statistical  hypothesis  testing 
(which  provides  a  P-value  that  indicates  the  significance  by 
which  we  can  reject  the  null  hypothesis  that  the  two  algo¬ 
rithms  are  equally  efficacious  in  favor  of  the  alternative  that 
one  is  preferable).  Such  studies  bundle  together  two  things: 

(i)  the  formulation  of  the  constrained  optimization  task  and 

(ii)  the  performance  of  the  algorithm  in  performing  that  task. 
The  first  of  these  requires  a  translation  of  the  medical  aim  into 
a  mathematical  model,  it  is  important  that  this  model  should 
be  appropriately  chosen. 

The  superiorization  approach  is  not  about  choosing  this 
model,  it  kicks  in  once  the  model  is  chosen  and  aims 
at  producing  an  output  that  is  “good”  according  to  the 
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mathematical  specifications  of  the  constraints  and  of  the 
optimization  criterion.  Thus  superiorization  has  been  used 
to  compare  the  effects  on  the  quality  of  the  output  in  CT 
when  the  optimization  criterion  is  specified  by  total  vari¬ 
ation  (TV)  versus  by  entropy28  or  versus  by  the  l  \  -norm 
of  the  Haar  transform.  '2  However,  the  current  paper  is  not 
about  discussing  how  to  translate  the  underlying  medical 
physics  task  into  a  constrained  optimization  problem.  For 
our  purposes  here,  we  are  assuming  that  the  mathematical 
model  has  been  worked  out  and  concentrate  on  the  algo¬ 
rithmic  approach  for  solving  the  resulting  constrained  op¬ 
timization  problem.  We  claim  that  the  evaluation  of  such 
algorithms  should  not  be  based  on  the  medical  figures  of 
merit  mentioned  at  the  beginning  of  the  previous  paragraph, 
but  rather  on  their  performance  in  solving  the  mathemat¬ 
ical  problem.  If  “good”  solutions  to  the  constrained  opti¬ 
mization  problem  are  not  medically  efficacious,  that  indi¬ 
cates  that  something  is  wrong  with  the  mathematical  model 
and  not  that  something  is  wrong  with  the  algorithmic  ap¬ 
proach.  For  this  reason,  in  this  paper  we  will  not  carry  out 
a  careful  investigation  of  the  medical  efficacy  of  any  algo¬ 
rithm  in  the  manner  that  we  have  done  in  Sec.  4.3  of  Ref.  30, 
but  will  restrict  ourselves  to  a  simple  illustration  of  the  per¬ 
formance  of  the  superiorization  approach  as  compared  to  the 
previously  published  algorithm  of  Ref.  42  that  is  aimed  at  per¬ 
forming  exact  minimization. 

Examples  of  such  studies  already  exist.  Superiorization 
was  compared  in  Ref.  27  with  Algorithm  6  of  Ref.  40  and  in 
Ref.  45  with  the  algorithm  of  Goldstein  and  Osher  that  they 
refer  to  as  TwIST  (Ref.  46)  with  split  Bregman47  as  the  sub¬ 
step.  In  both  cases  the  implementation  was  done  by  the  pro¬ 
posers  of  the  algorithms.  In  these  reported  instances  superi¬ 
orization  did  well:  the  constraints-compatibility  and  the  value 
of  the  function  to  be  minimized  were  very  similar  for  the  out¬ 
puts  produced  by  the  algorithms  being  compared,  but  the  su¬ 
periorization  algorithm  produced  its  output  four  times  faster 
than  the  alternative.  It  would  be  unjustified  to  draw  any  gen¬ 
eral  conclusions  on  the  mathematical  performance  and  speed 
of  superiorization  based  on  just  a  few  experiments,  but  the 
reported  results  are  encouraging. 

However,  the  main  reason  why  we  advocate  superioriza¬ 
tion  is  different  from  what  is  discussed  above.  The  reason 
why  we  claim  it  to  be  helpful  in  medical  physics  research 
is  that  it  has  the  potential  of  saving  a  lot  of  time  and  ef¬ 
fort  for  the  researcher.  Let  us  consider  a  historical  example. 
Likelihood  optimization  using  the  iterative  process  of  expec¬ 
tation  maximization  (EM)  (Ref.  48)  gained  immediate  and 
wide  acceptance  in  the  emission  tomography  community.  It 
was  observed  that  irregular  high  amplitude  patterns  occurred 
in  the  image  with  a  large  number  of  iterations,  but  it  was 
not  until  five  years  later  that  this  problem  was  corrected44 
by  the  use  of  a  maximum  a  posteriority  probability  (MAP) 
algorithm  with  a  multivariate  Gaussian  prior.  Had  we  had 
at  our  disposal  the  superiorization  approach,  then  the  intro¬ 
duction  of  an  optimization  criterion  (Gaussian  or  other)  into 
the  iterative  EM  process  would  have  been  a  simple  matter 
and  we  would  have  saved  the  time  and  effort  spent  on  de¬ 
signing  a  special  purpose  algorithm  for  the  MAP  formula¬ 


tion.  A  TV -superiorization  of  the  EM  algorithm  is  presented 
in  Ref.  50. 

Even  though  our  major  claim  for  superiorization  is  that  it 
provides  a  quick  route  to  algorithms  for  the  practical  solution 
of  constrained  optimization  problems,  before  leaving  this  in¬ 
troduction  let  us  bring  up  a  question  that  has  to  do  with  the 
performance  of  the  resulting  algorithms:  Will  superiorization 
produce  superior  results  to  those  produced  by  contemporary 
MAP  methods  or  is  it  faster  than  the  better  of  such  methods? 
At  this  stage  we  have  not  yet  developed  the  mathematical  no¬ 
tation  to  discuss  this  question  in  a  rigorous  manner,  we  return 
to  it  in  Subsection  ILF. 

In  Sec.  Ill,  we  present  in  detail  the  superiorization  method¬ 
ology.  In  Sec.  Ill,  we  provide  an  illustrative  example  by  re¬ 
porting  on  reconstructions  produced  by  algorithms  applied  to 
simulated  computerized  tomography  data  of  a  head  cross  sec¬ 
tion.  In  the  final  section,  we  discuss  our  results  and  present 
our  conclusions. 

II.  THE  SUPERIORIZATION  METHODOLOGY 

II. A.  Problem  sets,  proximity  functions,  and 
e-compatibility 

Although  optimization  is  often  studied  in  a  more  general 
context  (such  as  in  Hilbert  or  Banach  spaces),  in  medical 
physics  we  usually  deal  with  a  special  case,  where  optimiza¬ 
tion  is  performed  in  a  Euclidean  space  JR7  (the  space  of  ./- 
dimensional  vectors  of  real  numbers,  where  /  is  a  positive  in¬ 
teger).  As  often  appropriate  in  practice,  we  further  restrict  the 
domain  of  optimization  to  a  nonempty  subset  12  of  R7  (such 
as  the  non-negative  orthant  My  that  consists  of  vectors  all  of 
whose  components  are  non-negative). 

We  now  turn  to  formalizing  the  notion  of  being  compatible 
with  given  constraints,  a  notion  that  we  have  used  informally 
in  Sec.  I.  In  any  application,  there  is  a  problem  set  T ;  each 
problem  T  e  T  is  essentially  a  description  of  the  constraints 
in  that  particular  case.  For  example,  for  a  tomographic 
scanner,  the  problem  of  reconstruction  for  a  particular  patient 
at  a  particular  time  is  determined  by  the  measurements  taken 
by  the  scanner  for  that  patient  at  that  time.  The  intuitive 
notion  of  constraints-compatibility  is  formalized  by  the  use 
of  a  proximity  function  Vr  on  T  such  that,  for  every  lei, 
Vrj  maps  12  into  R+,  the  set  of  non-negative  real  numbers; 
i.e.,  Vrj  :  12  — »■  M+.  Intuitively,  we  think  of  Vrj(x)  as  an 
indicator  of  how  incompatible  x  is  with  the  constraints  of  T. 
For  example,  in  tomography,  VrT(x)  should  indicate  by  how 
much  a  proposed  reconstruction  that  is  described  by  an  x  in 
12  violates  the  constraints  of  the  problem  T  that  are  provided 
by  the  measurements  taken  by  the  scanner.  For  example,  if 
we  use  b  to  denote  the  vector  of  estimated  line  integrals  based 
on  the  measurements  obtained  by  the  scanner  and  by  A  the 
system  matrix  of  the  scanner,  then  a  possible  choice  for  the 
proximity  function  is  the  norm-distance  \\b  —  Ajt||,  which 
we  will  use  as  an  example  in  the  discussions  that  follow.  An 
alternative  legitimate  choice  for  the  proximity  function  is  the 
Kullback-Leibler  distance  KL(b,  Ax),  which  is  the  negative 
log-likelihood  of  a  statistical  model  in  tomography.  The 
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special  case  Vrr(x)  —  0  is  interpreted  by  saying  that  x  is 
perfectly  compatible  with  the  constraints;  due  to  the  presence 
of  noise  in  practical  applications,  it  is  quite  conceivable  that 
there  is  no  x  that  is  perfectly  compatible  with  the  constraints, 
and  we  accept  an  x  as  constraints-compatible  as  long  as  the 
value  of  Vrr(x)  is  considered  to  be  small  enough  to  justify 
that  decision.  Combining  these  two  concepts  leads  to  the 
notion  of  a  problem  structure,  which  is  a  pair  (T,  Vr),  where 
T  is  a  nonempty  problem  set  and  Vr  is  a  proximity  function 
on  T.  For  a  problem  structure  (T,  Vr),  a  problem  T  e  T,  a 
non-negative  e,  and  an  x  e  £2,  we  say  that  x  is  e-compatible 
with  T  provided  that  Vr/  ix)  <  e. 

As  an  example  (whose  applicability  to  tomographic  re¬ 
construction  is  illustrated  in  Sec.  Ill),  consider  the  problem 
structure  that  arises  from  the  desire  to  find  non-negative  so¬ 
lutions  of  sequences  of  blocks  of  linear  equations.  Then  the 
appropriate  choices  are  Q  =  IR.7  and  the  problem  structure  is 
(S,  Res),  where  the  problem  set  §  is 

S  =  {({(«',  bx),...,{al\bh )},..., 


u  \  (nh+-+tv/  u  \i\| 

u®  Vil+...+iw_\+\),  ■  ■  ■  Aa  Vti+...+tw)i)\ 


W  is  a  positive  integer  and, 


for  1  <  w  <  W,  iw  is  a  positive  integer  and, 

for  1  <  i  <  l\  +  . . .  +  lw ,  a'  e  R7  and  bt  e  R}  (1) 


and  the  proximity  function  Res  on  S  is  defined,  for  any 
problem  S  =  ({(a1,  b\),. . . ,  (atl ,  b^)}, . . . ,  {(afl+-+^-1+1, 
bi1+...+iw_1+\), . . . ,  (ah+  -+lw ,  bii+,„+tw)})  in  S  and  for  any 
x  e  £2,  by 


Ress(x)=  (bi  ~  {a‘,x))2.  (2) 

N  i=i 


Note  that  each  element  of  this  problem  set  S  specifies  an 
ordered  sequence  of  W  blocks  of  linear  equations  of  the  form 
(a1 ,  x)  =  bi  where  (*,*)  denotes  the  inner  product  in  R7  (and 
thus  S  is  an  appropriate  representation  of  the  so-called  “or¬ 
dered  subsets”  approach  to  tomographic  reconstruction,51  as 
well  as  of  other  earlier-published  block-iterative  methods  that 
proposed  essentially  the  same  idea'’2-54).  The  proximity  func¬ 
tion  Res  on  S  is  the  residual  that  we  get  when  a  particular*  is 
substituted  into  all  the  equations  of  a  particular  problem  S. 


Selecting  !T2  =  R7  and  A  =  R7  for  the  problem  structure 
(S,  Res)  of  Subsection  II. A,  an  example  of  an  algorithm  R  is 
specified  by 

Rsx  =  QBSw  BSlx,  (3) 


where  S  is  the  problem  specified  above  in  Eq.  (2)  and,  for 
l  <  w  <  W,  :  A  A  is  defined  by 


Be  *  =  * 


T  E 


bi  -  (a',x) 


(4) 


i— £i+...+£u>-i+l 


where  ||a||  denotes  the  norm  of  the  vector  a  in  R7,  and  Q  : 
A  — »■  £2  is  defined  by 

(Q x)j  =  max{0,  Xj},  for  1  <  j  <  J.  (5) 

Note  that  R5  :  A  —*■  £2.  This  specific  algorithm  R  is  a  typ¬ 
ical  example  of  the  so-called  block-iterative  methods  men¬ 
tioned  above.  Except  for  the  presence  of  Q  in  Eq.  (3),  which 
enforces  non-negativity  of  the  components,  it  is  identical  to 
an  algorithm  used  and  illustrated  in  Ref.  31.  With  the  Q  ab¬ 
sent  from  the  definition  of  the  algorithm,  £2  has  to  be  the 
whole  of  R7;  the  practical  consequence  of  the  presence  ver¬ 
sus  the  absence  of  Q  in  the  tomographic  application  is  illus¬ 
trated  in  Subsection  III.D.  We  also  note  that  special  cases  of 
the  presented  algorithm  include  the  classical  reconstruction 
methods  such  as  algebraic  reconstruction  technique  (ART)  (if 
lw  =  1,  for  1  <  w  <  W )  and  SIRT  (if  W  =  1);  see,  for  ex¬ 
ample,  Chaps.  11  and  12  of  Ref.  55. 

For  a  problem  structure  (T  ,Vr),  a  T  el,  an  ee  R+, 
and  a  sequence  R  —  (xk  )fi{)  of  points  in  Q,  we  use  (9(7, 
e,  R)  to  denote  the  x  e  Q  that  has  the  following  properties: 
VrT(x)  <  e  and  there  is  a  non-negative  integer  K  such  that 
xK  —  x  and,  for  all  non-negative  integers  k  <KVrT{xk)  >  e. 
Clearly,  if  there  is  such  an  x,  then  it  is  unique.  If  there  is  no 
such  x,  then  we  say  that  0(T,  e,  R)  is  undefined,  otherwise 
we  say  that  it  is  defined.  The  intuition  behind  this  definition 
is  the  following:  if  we  think  of  R  as  the  (infinite)  sequence 
of  points  that  is  produced  by  an  algorithm  (intended  for  the 
problem  T)  without  a  termination  criterion,  then  (9(7’,  e,  R)  is 
the  output  produced  by  that  algorithm  when  we  add  to  it  in¬ 
structions  that  make  it  terminate  as  soon  as  it  reaches  a  point 
that  is  e-compatible  with  T. 


II. B.  Algorithms  and  outputs 

We  now  define  the  concept  of  an  algorithm  in  the  general 
context  of  problem  structures.  For  technical  reasons  that  will 
become  clear  as  we  proceed  with  our  development,  we  intro¬ 
duce  an  additional  set  A,  such  that  C  A  C  R7.  (Both  Q 
and  A  are  assumed  to  be  known  and  fixed  for  any  particu¬ 
lar  problem  structure  (T ,  Vr).)  An  algorithm  P  for  a  problem 
structure  (T  ,Vr)  assigns  to  each  problem  T  e  T  an  oper¬ 
ator  P7-  :  A  —*■  Q,.  This  definition  is  used  to  define  iterative 
processes  that,  for  any  initial  point  x  e  Q.  produce  the  (po¬ 
tentially)  infinite  sequence  ((PtO^x)^  (that  is,  the  sequence 
x,  PyX,  Py(PyX),  •  ■  ■)  of  points  in  f2.  We  discuss  below  how 
such  a  potentially  infinite  process  is  terminated  in  practice. 


II.C.  Bounded  perturbation  resilience 

The  notion  of  a  bounded  perturbations  resilient  algorithm 
P  for  a  problem  structure  (T ,  Vr)  has  been  defined  in  a  math¬ 
ematically  precise  manner.29  However,  that  definition  is  not 
satisfactory  from  the  point  of  view  of  applications  in  medical 
physics  (or  indeed  in  any  area  involving  noisy  data),  because 
it  is  useful  only  for  problems  T  for  which  there  is  a  perfectly 
compatible  solution  (that  is,  an  x  such  that  Vrr (x)  =  0).  We 
therefore  extend  here  that  notion  as  follows.  An  algorithm  P 
for  a  problem  structure  (T,  Vr)  is  said  to  be  strongly  pertur¬ 
bation  resilient  if,  for  all  T  e  T , 

(i)  there  exists  an  e  e  R+  such  that  0(T,  e,  ((PT)kx)ff0) 
is  defined  for  every  x  e  £2; 
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(ii)  for  all  s  e  R+  such  that  0{T,  s,  ((PT)kx)ff0)  's  de- 
fined  for  every  r  e  we  also  have  that  0(7’,  s',  R ) 
is  defined  for  every  s'  >  s  and  for  every  sequence 
R  —  (xk)f={]  of  points  in  generated  by 

x*+1  =  Pr(x*  +  favk),  for  all  k  >  0,  (6) 

where  f kvk  are  bounded  perturbations,  meaning  that 
the  sequence  (fk)ff 0  of  non-negative  real  numbers 

Eoo 

<  oo),  the  sequence 

(»*)£o  °f  vectors  in  Ry  is  bounded  and,  for  all 
k>0,xk  +  fkvk  e  A. 

In  less  formal  terms,  the  second  of  these  properties  says 
that  for  a  strongly  perturbation  resilient  algorithm  we  have 
that,  for  every  problem  and  any  non-negative  real  number  s, 
if  it  is  the  case  that  for  all  initial  points  from  !T2  the  infinite  se¬ 
quence  produced  by  the  algorithm  contains  an  e-compatible 
point,  then  it  will  also  be  the  case  that  all  perturbed  sequences 
satisfying  Eq.  (6)  contain  an  e'-compatible  point,  for  any 
s'  >  e. 

Having  defined  the  notion  of  a  strongly  perturbation  re¬ 
silient  algorithm,  we  next  show  that  this  notion  is  of  relevance 
to  problems  in  medical  physics.  We  illustrate  the  use  of  this 
in  tomography  in  Sec.  III.  We  first  need  to  introduce  some 
mathematical  concepts. 

Given  an  algorithm  P  for  a  problem  structure  (T ,  Vr)  and 
a  T  e  T ,  we  say  that  P  is  convergent  for  T  if,  for  every  red, 
there  exists  a  unique  y(x)  e  £2  such  that,  lim^ocXPr)4* 
=  y(x),  meaning  that  for  every  positive  real  number  <5,  there 
exists  a  non-negative  integer  K,  such  that  ||(P7’)<:x  —  y(x)|| 
<  <5,  for  all  non-negative  integers  k  >  K.  If,  in  addition,  there 
exists  a  ye  R+  such  that  VrT(y(x))  <  y,  for  every  x  e  £2, 
then  we  say  that  P  is  boundedlv  convergent  for  T. 

A  function  f  :  Q  R  is  uniformly  continuous  if,  for  ev¬ 
ery  s  >  0  there  exists  a  8  >  0,  such  that,  for  all  x,  y  e  Q, 
|/(x)  —  f(y)\  <  s  provided  that  ||x  —  _y||  <  <5.  An  example 
of  a  uniformly  continuous  function  is  Ress  of  Eq.  (2),  for 
any  S  e  S.  This  can  be  proved  by  observing  that  the  right- 
hand  side  of  Eq.  (2)  can  be  rewritten  in  vector/matrix  form 
as  \\b  —  Ax ||  and  then  selecting,  for  any  given  s  >  0,  5  to  be 
e/ 1|  A  ||,  where  ||  A||  denotes  the  matrix  norm  of  A. 

An  operator  O:  A  — »•  f2,  is  nonexpansive  if  ||Ox  —  0_y| 
<  || x  —  _y  || ,  for  all  x,  y  e  A.  An  example  of  a  nonexpansive 
operator  is  the  Ry  of  Eq.  (3).  The  proof  of  this  is  also  sim¬ 
ple.  It  follows  from  discussions  regarding  similar  claims  in 
Ref.  27  that  the  BS|i  :  Ry  -*  Ry  of  Eq.  (4)  is  a  nonexpan¬ 
sive  operator,  for  1  <  w  <  W,  and  that  the  operator  Q  of 
Eq.  (5)  is  also  nonexpansive.  Obviously,  a  sequential  appli¬ 
cation  of  nonexpansive  operators  results  in  a  nonexpansive 
operator  and  thus  Ry  is  nonexpansive. 

Now  we  state  an  important  new  result  that  gives  suffi¬ 
cient  conditions  for  strong  perturbation  resilience:  If  P  is 
an  algorithm  for  a  problem  structure  (T  ,Vr)  such  that,  for 
all  T  e  T,  P  is  boundedly  convergent  for  T,  VrT  :  Q,  — >-  R 
is  uniformly  continuous,  and  Pr  :  A  — ►  £2  is  nonexpansive, 
then  P  is  strongly  perturbation  resilient.  The  importance  of 
this  result  lies  in  the  fact  that  the  rather  ordinary  condition  of 
uniform  continuity  for  the  proximity  function  and  the  reason¬ 


able  conditions  of  bounded  convergence  and  nonexpansive- 
ness  of  the  algorithmic  operators  guarantee  that  we  end  up 
with  a  strongly  perturbation  resilient  algorithm.  The  proof  of 
this  new  result  involves  some  mathematical  technicalities  and 
is  therefore  presented  in  the  Appendix  as  Theorem  1 . 

II. D.  Optimization  criterion  and  nonascending  vector 

Now  suppose,  as  is  indeed  the  case  for  the  constrained 
optimization  problems  discussed  in  Sec.  I,  that  in  addition 
to  a  problem  structure  (T  ,Vr)  we  are  also  provided  with 
an  optimization  criterion,  which  is  specified  by  a  function 
</>  :  A  —*■  R,  with  the  convention  that  a  point  in  A  for  which 
the  value  of  ([>  is  smaller  is  considered  superior  (from  the  point 
of  view  of  our  application)  to  a  point  in  A  for  which  the  value 
of  tp  is  larger.  In  the  tomography  context,  any  of  the  functions 
of  x  that  are  listed  as  a  “secondary  optimization  criterion”  (an 
alternative  name  is  a  “regularizer”)  in  Sec.  6.4  of  Ref.  55  is  an 
acceptable  choice  for  the  optimization  criterion  cp.  These  in¬ 
clude  weighted  norms,  the  negative  of  Shannon’s  entropy  and 
total  variation.  It  is  the  last  of  these  that  we  discuss  in  detail  in 
the  illustrative  example  below.  The  essential  idea  of  the  supe¬ 
riorization  methodology  presented  in  this  paper  is  to  make  use 
of  the  perturbations  of  Eq.  (6)  to  transform  a  strongly  pertur¬ 
bation  resilient  algorithm  that  seeks  a  constraints-compatible 
solution  into  one  whose  outputs  are  equally  good  from  the 
point  of  view  of  constraints-compatibility,  but  are  superior 
according  to  the  optimization  criterion.  We  do  this  by  pro¬ 
ducing  from  the  algorithm  another  one,  called  its  superi- 
orized  version,  by  making  sure  not  only  that  the  fkvk  are 
bounded  perturbations,  but  also  that  cp(xk  +  fkvk)  <  <P(xk), 
for  all  k  >  0. 

In  order  to  ensure  this  we  introduce  a  new  concept  (closely 
related  to  the  concept  of  a  “descent  direction”  that  is  widely 
used  in  optimization).  Given  a  function  (p  :  A  —>  R  and  a 
point  x  e  A,  we  say  that  a  vector  d  e  Ry  is  nonascending 
for  tp  at  x  if  || d ||  <  1  and 

there  is  a  8  >0  such  that  for  all  X  e  [0,  <5], 

(7) 

(x  +  Xd)  e  A  and  tp(x  +  Xd)  <  (pix). 

Note  that  irrespective  of  the  choices  of  tp  and  x,  there  is  al¬ 
ways  at  least  one  nonascending  vector  d  for  tp  at  x,  namely,  the 
zero-vector,  all  of  whose  components  are  zero.  This  is  a  useful 
fact  for  proving  results  concerning  the  guaranteed  behavior  of 
our  proposed  procedures.  However,  in  order  to  steer  our  algo¬ 
rithms  towards  a  point  at  which  the  value  of  tp  is  small,  we 
need  to  find  a  d  such  that  <p{x  +  Xd)  <  (pix)  rather  than  just 
<p(x  +  Xd)  <  (pix)  as  in  Eq.  (7).  In  some  earlier  papers  on 
superiorization27-31  it  was  assumed  that  A  =  Ry  and  that  (p 
is  a  convex  function.  This  implied  that,  for  any  point  x  e  A, 
tp  had  a  subgradient  g  e  Ry  at  the  point  x.  It  was  suggested 
that  if  there  is  such  a  g  with  a  positive  norm,  then  d  should 
be  chosen  to  be  —  g/||g||,  otherwise  d  should  be  chosen  to  be 
the  zero  vector.  However,  there  are  approaches  (not  involving 
subgradients)  to  selecting  an  appropriate  d\  an  example  can  be 
found  in  Ref.  32  in  which  d  is  found  without  using  subgradi¬ 
ents  for  the  case  when  cp  is  the  l  \ -norm  of  the  Haar  transform. 
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The  method  we  used  for  selecting  a  nonascending  vector  in 
the  experiments  reported  in  this  paper  is  specified  at  the  end 
of  Subsection  III. A. 

II. E.  Superiorized  version  of  an  algorithm 

We  now  make  precise  the  ingredients  needed  for  trans¬ 
forming  an  algorithm  into  its  superiorized  version.  Let  Q  and 
A  be  the  underlying  sets  for  a  problem  structure  (T  ,Vr) 
(Q  C  A  C  Ry,  as  discussed  at  the  beginning  of  Subsec¬ 
tion  II. B),  P  be  an  algorithm  for  (T ,  Vr)  and  (p  '■  A  —*■  R. 
The  following  description  of  the  Superiorized  Version  of 
Algorithm  P  produces,  for  any  problem  T  e  T,  a  sequence 
Rt  —  (xk)<jf=0  of  points  in  ^  for  which,  for  all  k  >  0,  Eq.  (6) 
is  satisfied.  We  show  this  to  be  true,  for  any  algorithm  P,  after 
the  description  of  the  Superiorized  Version  of  Algorithm  P. 
Furthermore,  since  the  sequence  Rt  is  steered  by  Superiorized 
Version  of  Algorithm  P  towards  a  reduced  value  of  <p,  there 
is  an  intuitive  expectation  that  the  output  of  the  superiorized 
version  is  likely  to  be  superior  (from  the  point  of  view  of 
the  optimization  criterion  </;)  to  the  output  of  the  original 
unperturbed  algorithm.  This  last  statement  is  not  precise  and 
so  it  cannot  be  proved  in  a  mathematical  sense  for  an  arbitrary 
algorithm  P;  however,  that  should  not  stop  us  from  applying 
the  easy  procedure  given  below  for  automatically  producing 
the  superiorized  version  of  P  and  experimentally  checking 
whether  it  indeed  provides  us  with  outputs  superior  to  those 
of  the  original  algorithm.  The  well-demonstrated  nature  of 
heuristic  optimization  approaches  is  that  they  often  work  in 
practice  even  when  their  performance  cannot  be  guaranteed 
to  be  optimal.  ’ 

Nevertheless,  we  can  push  our  theory  further  than  the  hope 
expressed  in  the  last  paragraph,  by  considering  superiorized 
versions  of  algorithms  that  satisfy  some  condition.  In  this  pa¬ 
per,  the  condition  that  we  discuss  is  strong  perturbation  re¬ 
silience.  We  show  below  that  if  P  is  strongly  perturbation 
resilient,  then,  for  any  problem  T  e  T,  a  sequence  R/  pro¬ 
duced  by  its  superiorized  version  has  the  following  desirable 
property:  For  all  s  e  R+,  if  0{T,  e,  ((Pr/^l^o)  *s  defined 
for  every  x  e  £2,  then  0(T,  s',  Rt)  is  also  defined  for  every 
s'  >  s\  in  other  words,  the  Superiorized  Version  of  Algorithm 
P  provides  an  e'-compatible  output.  As  stated  above,  the  ad¬ 
vantage  of  the  superiorized  version  is  that  its  output  is  likely 
to  be  superior  to  the  output  of  the  original  unperturbed  al¬ 
gorithm.  We  point  out  that  strong  perturbation  resilience  is  a 
sufficient,  but  not  necessary,  condition  for  guaranteeing  such 
desirable  behavior  of  the  superiorized  version,  finding  addi¬ 
tional  sufficient  conditions  and  proving  that  algorithms  that 
we  wish  to  superiorize  satisfy  such  conditions  is  part  of  our 
ongoing  research. 

The  superiorized  version  assumes  that  we  have  available 
a  summable  sequence  (ye)%f0  of  positive  real  numbers  (for 
example,  y k  —  </,  where  0  <  a  <  1)  and  it  generates,  simul¬ 
taneously  with  the  sequence  (xk)f=0,  sequences  (i/')£L0,  and 
The  latter  is  generated  as  a  subsequence  of  (yr)^L0, 
resulting  in  a  summable  sequence  (/3a  )£L0-  The  algorithm  fur¬ 
ther  depends  on  a  specified  initial  point  x  e  £2  and  on  a  posi¬ 
tive  integer  N.  It  makes  use  of  a  logical  variable  called  loop. 


Superiorized  Version  of  Algorithm  P 

(i)  set  k  —  0 

(ii)  set  xk  =  x 

(iii)  set  £  =  —  1 

(iv)  repeat 

(v)  set  n  =  0 

(vi)  set  xk-n  —  xk 

(vii)  while  n  <  N 

(viii)  set  vk,n  to  be  a  nonascending  vector  for  (f)  at 

Xk'n 

(ix)  set  loop  —  true 

(x)  while  loop 

(xi)  set  £  =  £  +  1 

(xii)  set  /3kt  „  —  ye 

(xiii)  set  z  —  xk’n  +  fik,n  vk'n 

(xiv)  if  z  e  A  and  <p(z)  <  <p(xk),  then 

(xv)  set  n  —  n  +  1 

(xvi)  set  xk'n  =  z 

(xvii)  set  loop  =  false 

(xviii)  set  =  PTxkiN 

(xix)  set  k  =  k  +  1 . 

Next  we  analyze  the  behavior  of  the  Superiorized  Version  of 
Algorithm  P. 

The  iteration  number  k  is  set  to  0  in  (i)  and  xk  =  x°  is  set 
to  its  initial  value  x  in  (ii).  The  integer  index  £  for  picking  the 
next  element  from  the  sequence  (yi)f={)  is  initialized  to  —1 
by  line  (iii),  it  is  repeatedly  increased  by  line  (xi).  The  lines 
(v)-(xix)  that  follow  the  repeat  in  (iv)  perform  a  complete 
iterative  step  from  xk  to  xk+ 1 ,  infinite  repetitions  of  such  steps 
provide  the  sequence  Rt  —  (xk)f={].  During  one  iterative  step, 
there  is  one  application  of  the  operator  P7 ,  in  line  (xviii),  but 
there  are  N  steering  steps  aimed  at  reducing  the  value  of  <p; 
the  latter  are  done  by  lines  (v)-(xvii).  These  lines  produce  a 
sequence  of  points  xk,n,  where  0  <  n  <  N  with  xk,°  =  xk, 
xk,n  e  A,  and  <p(xk'n)  <  <p(xk). 

We  prove  the  truth  of  the  last  sentence  by  induction  on 
the  non-negative  integers.  For  n  —  0,  we  have  by  lines  (v) 
and  (vi)  that  xk,°  —  xk .  But  xk  e  Q.  ,  since  it  is  either  x  that 
is  assumed  to  be  in  Q  due  to  lines  (i)  and  (ii)  or  it  is  in  the 
range  Q  of  P7  due  to  lines  (xviii)  and  (xix).  Now  we  assume, 
for  any  0  <  n  <  N,  that  xk'n  e  A  and  <p(xk’n)  <  <j)(xk)  and 
show  that  lines  (viii)-(xvii)  perform  a  computation  that  leads 
from  xk'n  to  an  xk,n+l  e  A  that  satisfies  <p(xk’n+1)  <  4>{xk). 
To  see  this,  observe  that  line  (viii)  sets  vk,n  to  be  a  nonascend¬ 
ing  vector  for  <j>  at  xk,n,  which  implies  that  Eq.  (7)  is  satis¬ 
fied  with  x  —  xk'n  and  d  =  vk'n .  Line  (ix)  sets  loop  to  true, 
and  it  remains  true  while  searching  for  the  desired  xkjl+[ , 
by  repeatedly  executing  the  loop  sequence  that  follows  line 
(x).  In  this  sequence,  line  (xi)  increases  £  by  1  and  line  (xii) 
sets  /1a,  n  to  yi.  Thus  for  the  vector  z  defined  by  line  (xiii), 
z  e  A  and  <p(z )  <  <p(xk'"),  provided  that  fkn  is  not  greater 
than  the  S  in  Eq.  (7).  Since  (y7)“  0  is  a  summable  sequence 
of  positive  real  numbers,  there  must  be  a  positive  integer  L 
such  that  yi  <  S,  for  all  £  >  L.  This  implies  that  if  we  ap¬ 
plied  lines  (xi)-(xiii)  often  enough,  we  would  reach  a  vector 
z  that  satisfies  z  e  A  and  <p(z)  <  <pixk'n).  If  the  condition  in 
line  (xiv)  is  not  satisfied  when  the  process  gets  to  it,  then  lines 


Medical  Physics,  Vol.  39,  No.  9,  September  2012 


5538 


Herman  et  al.\  Superiorization:  An  optimization  heuristic  for  medical  physics 


5538 


(xi)-(xiii)  are  again  executed  and  eventually  we  get  a  vector 
z  for  which  the  condition  in  line  (xiv)  is  satisfied  due  to  the 
induction  hypothesis  that  (p(xk’n )  <  <p(xk).  By  lines  (xv)  and 
(xvi)  we  see  that  at  that  time  x4’"44  is  set  to  z  and  so  we  ob¬ 
tain  that  xk’n+1  e  A  and  <p(xk'n+l)  <  4>{xk),  as  desired.  Line 
(xvii)  sets  loop  to  false  and  so  control  is  returned  to  line  (vii). 
When  this  happens  for  the  Mh  time,  it  will  be  the  case  that  n 
—  N  and,  therefore,  line  (xviii)  is  used  to  produce  xk+ 1  e  £2 
and  the  increasing  of  k  by  line  (xix)  allows  us  then  to  move 
on  to  the  next  iterative  step.  Infinite  repetition  of  such  steps 
produces  the  sequence  Rj  =  (xk)^L0  of  points  in  £2. 

We  now  show  that  if  0(T,  s,  ((P-/)4x)j*i0)  is  defined  for 
every  te£2,  then,  for  any  s'  >  e,  the  Superiorized  Version 
of  Algorithm  P  produces  an  e'-compatible  output.  Since  P 
is  assumed  to  be  strongly  perturbation  resilient,  this  desired 
result  follows  if  we  can  show  that  there  exists  a  summable 
sequence  )*2=o  °f  non-negative  real  numbers  and  a  bounded 
sequence  (vk)'£L0  of  vectors  in  Ry  such  that  Eq.  (6)  is  satisfied 
for  all  k  >  0.  In  view  of  line  (xviii),  this  is  achieved  if  we  can 
define  the  Pk  and  the  vk  so  that  xk,N  —  xk  +  PkVk.  This  is 
done  by  setting 


Pk  =  ma x{pk,n  \0<n<N},  (8) 


That  these  assignments  result  in  xk'N  =  xk  +  PkVk  follows 
from  lines  (v)-(xvii).  From  line  (xii)  follows  that  (Pk)kL{)  is 
a  subsequence  of  and,  hence,  it  is  a  summable  se¬ 

quence  of  non-negative  real  numbers.  Since  each  ||u4,"||  <  1 
by  the  definition  of  a  nonascending  vector,  it  follows  from 
Eqs.  (8)  and  (9)  that  ||t/||  <  N  and  so  (vk)fL0  is  bounded. 
Part  of  the  condition  expressed  in  Eq.  (6)  is  that,  for  all 
k  >  0,  xk  +  PkVk  e  A.  This  follows  from  the  fact  that 
xk’N  =  xk  +  PkVk  is  assigned  its  value  by  line  (xvi),  but  only 
if  the  condition  expressed  in  line  (xiv)  is  satisfied. 

In  conclusion,  we  have  shown  that  the  superiorized  ver¬ 
sion  of  a  strongly  perturbation  resilient  algorithm  produces 
outputs  that  are  essentially  as  constraints-compatible  as  those 
produced  by  the  original  version  of  the  algorithm.  However, 
due  to  the  repeated  steering  of  the  process  by  lines  (vii)-(xvii) 
towards  reducing  the  value  of  the  optimization  criterion  0,  we 
can  expect  that  the  output  of  the  superiorized  version  will  be 
superior  (from  the  point  of  view  of  <p)  to  the  output  of  the 
original  algorithm. 


N- 1 


Pk,n  kn 

0 

n= 0 


(9) 


II. F.  Information  on  performance  comparison 
with  MAP  methods 

Using  our  notation,  the  constrained  minimization  formula¬ 
tion  that  we  are  considering  is  as  follows:  Given  an  s  e  R+, 


minimize  0(x),  subject  to  Vrj{x)  <  e.  (10) 


The  aim  of  superiorization  is  not  identical  with  the  aim  of 
constrained  minimization  in  Eq.  (10).  One  difference  is  that  s 
is  not  “given”  in  the  superiorization  context.  The  superioriza¬ 
tion  of  an  algorithm  produces  a  sequence  and,  for  any  e,  the 
associated  output  of  the  algorithm  is  considered  to  be  the  first 
x  in  the  sequence  for  which  Vrr(x)  <  s.  The  other  difference 
is  that  we  do  not  claim  that  this  output  is  a  minimizer  of  0 
among  all  points  that  satisfy  the  constraint,  but  hope  only  that 
it  is  usually  an  x  for  which  0(x)  is  at  the  small  end  of  its  range 
of  values  over  the  set  of  constraint-satisfying  points.  This  lat¬ 
ter  difference  is  generally  shared  by  comparisons  of  a  heuris¬ 
tic  approach  with  an  exact  approach  to  solving  a  constrained 
minimization  problem. 

The  MAP  (or  regularized)  formulation  of  a  physical  prob¬ 
lem  that  leads  to  the  constrained  minimization  problem  (10) 
is  the  unconstrained  minimization  problem  of  the  form:  Given 
a^el+, 

minimize  [0(x)  +  pVrT(x)].  (11) 

Formulations  of  both  kinds  [i.e.,  the  ones  of 
Eqs.  (10)  and  (11)]  are  widely  used  for  solving  medical 
physics  problems  and  the  question  “Which  of  these  two  for¬ 
mulations  leads  to  faster  or  better  solutions  of  the  underlying 
physical  problem?”  is  open.  Examples  of  both  formulations 
with  various  choices  for  Vrj  and  0  are  listed  in  the  beginning 
parts  of  the  paper  of  Goldstein  and  Osher.47 

We  now  return  to  the  question  raised  near  the  end  of 
Sec.  I:  Will  superiorization  produce  superior  results  to  those 
produced  by  contemporary  MAP  methods  or  is  it  faster  than 
the  better  of  such  methods?  As  yet,  there  is  very  little  informa¬ 
tion  available  regarding  this  general  question;  in  fact,  we  are 
aware  of  only  one  published  study.45  That  study  compared 
a  superiorization  algorithm  with  the  algorithm  of  Goldstein 
and  Osher  that  they  refer  to  as  TwIST  (Ref.  46)  with  split 
Bregman47  as  the  substep,  which  is  indeed  a  contemporary 
method  that  uses  the  MAP  formulation.  (For  example,  see  the 
discussion  of  the  split  Bregman  method  in  Ref.  56.)  The  prob¬ 
lem  S  to  which  the  two  algorithms  were  applied  was  one  from 
the  tomographic  problem  set  S  defined  in  Eq.  (1).  Res$  as  de¬ 
fined  in  Eq.  (2)  was  used  as  the  proximity  function  and  total 
variation,  TV  as  defined  below  in  Eq.  (12),  was  the  choice  for 
0.  It  is  reported  in  Ref.  45  that  for  the  outputs  of  the  two  algo¬ 
rithms  that  were  being  compared,  the  values  of  Ress  and  T  V 
were  very  similar,  but  the  superiorization  algorithm  produced 
its  output  four  times  faster  than  the  MAP  method. 


III.  AN  ILLUSTRATIVE  EXAMPLE 
III.A.  Application  to  tomography 

We  use  tomography  to  refer  to  the  process  of  reconstruct¬ 
ing  a  function  over  a  Euclidean  space  from  estimated  values 
of  its  integrals  along  lines  (that  are  usually,  but  not  necessar¬ 
ily,  straight).  The  particular  reconstruction  processes  to  which 
our  discussion  applies  are  the  series  expansion  methods ,  see 
Sec.  6.3  of  Ref.  55,  in  which  it  is  assumed  that  the  function 
to  be  reconstructed  can  be  approximated  by  a  linear  combi¬ 
nation  of  a  finite  number  (say  J)  of  basis  functions  and  the 
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reconstruction  task  becomes  one  of  estimating  the  coeffi¬ 
cients  of  the  basis  functions  in  the  expansion.  Sometimes, 
prior  knowledge  about  the  nature  of  the  function  to  be  recon¬ 
structed  allows  us  to  confine  the  sought-after  vector  x  of  coef¬ 
ficients  to  a  subset  !T2  of  Ry  (such  as  the  non-negative  orthant 
Ry).  We  use  i  to  index  the  lines  along  which  we  integrate, 
a'  e  Ry  to  denote  the  vector  whose  /th  component  is  the  in¬ 
tegral  of  the  /th  basis  function  along  the  /th  line,  and  b,  to  de¬ 
note  the  measured  integral  of  the  function  to  be  reconstructed 
along  the  /th  line.  Under  these  circumstances  the  constraints 
come  from  the  desire  that,  for  each  of  the  lines,  («' ,  x)  should 
be  close  (in  some  sense)  to  /?,. 

To  make  this  concrete,  consider  Eq.  (1).  Such  a  descrip¬ 
tion  of  the  constraints  arises  in  tomography  by  grouping  the 
lines  of  integration  into  W  blocks,  with  £w  lines  in  the  wth 
block.  Such  groupings  often  (but  not  always)  are  done  accord¬ 
ing  to  some  geometrical  condition  on  the  lines  (for  example, 
in  case  of  straight  lines,  we  may  decide  that  all  the  lines  that 
are  parallel  to  each  other  form  one  block).  In  this  framework, 
the  proximity  function  Res  defined  by  Eq.  (2)  provides  a  rea¬ 
sonable  measure  of  the  incompatibility  of  a  vector  x  with  the 
constraints.  The  algorithm  R  described  by  Eqs.  (3)— (5)  is  ap¬ 
plicable  to  this  concrete  formulation. 

There  are  many  optimization  criteria  that  have  been  used  in 
tomography,  see  Sec.  6.4  of  Ref.  55,  here  we  discuss  the  one 
called  TV,  whose  use  has  been  popular  in  medical  physics 
recently,  see  as  examples  Refs.  20,  22,  23,  and  41-44.  The 
definition  of  TV  that  we  use  here  requires  a  certain  way  of 
selecting  the  basis  functions.  It  is  assumed  that  the  function  to 
be  reconstructed  is  defined  in  the  plane  R2  and  is  zero-valued 
outside  a  square-shaped  region  in  the  plane.  This  region  is 
subdivided  into  J  smaller  equal- sized  squares  (pixels )  and  the 
J  basis  functions  are  defined  by  having  value  one  in  exactly 
one  pixel  and  value  zero  everywhere  else.  We  index  the  pixels 
by  j  and  we  let  C  denote  the  set  of  all  indices  of  pixels  that 
are  not  in  the  rightmost  column  or  the  bottom  row  of  the  pixel 
array.  For  any  pixel  with  index  j  in  C,  let  r(j)  and  b(j)  be  the 
index  of  the  pixel  to  its  right  and  below  it,  respectively.  We 
define  TV  :  Ry  — ►  R  by 

TV(x)  =  ^2y/(xj  -  xnj))2  +  (xj  -  xb(j) )2.  (12) 

feC 

The  method  we  adopted  to  generate  a  nonascending  vector 
for  the  TV  function  at  an  r  e  1J  is  based  on  Theorem  2  of 
the  Appendix.  It  is  applicable  since  TV  :  Ry  — >  R  is  a  con¬ 
vex  function;  see,  for  example,  the  end  of  the  Proof  of  Propo¬ 
sition  1  of  Ref.  41.  Now  consider  an  integer  /  such  that  1  </ 
<  J.  Looking  at  the  sum  in  Eq.  (12),  we  see  that  xy  appears 
in  at  most  three  terms,  in  which  /  must  be  either  j,  or  r(j),  or 
b(j)  for  some  j  e  C.  By  taking  the  formal  partial  derivatives  of 
these  three  terms,  we  see  that  jp^-(x)  is  well  defined  if  the  de¬ 
nominator  in  the  formal  derivative  of  each  of  the  three  terms 
is  not  zero  for  x.  In  view  of  this,  we  define  the  g  in  Theorem  2 
as  follows.  If  the  denominator  in  any  of  the  three  formal  par¬ 
tial  derivatives  with  respect  to  xy  has  an  absolute  value  less 
than  a  very  small  positive  number  (we  used  1 0  20),  then  we 
set  gj'  to  zero,  otherwise  we  set  it  to  j^j(x).  Clearly,  the  re¬ 


sulting  g  e  Ry  satisfies  the  condition  in  Theorem  2  and  hence 
provides  a  d  that  is  a  nonascending  vector  for  TV  at  jr. 

Previously  reported  reconstructions  using  T  E-superior- 
ization  selected  the  d  using  subgradients  as  discussed  in  the 
paragraph  following  Eq.  (7);  such  a  d  is  not  guaranteed  to 
be  a  nonascending  vector  for  the  T  V  function.  What  we  are 
proposing  here  is  not  only  mathematically  rigorous  (in  the 
sense  that  it  is  guaranteed  to  produce  a  nonascending  vector 
for  the  TV  function),  but  it  can  also  lead  to  a  better  recon¬ 
structions,  as  illustrated  in  Subsection  III.D. 


Ill . B .  The  data  generation  for  the  experiments 

The  datasets  used  in  the  experiments  reported  in  this 
paper  were  generated  in  such  a  way  that  they  share  the 
noise-characteristics  of  CT  scanners  when  used  for  scanning 
the  human  head  and  brain;  as  discussed,  for  example,  in 
Chap.  5  of  Ref.  55.  They  were  generated  using  the  software 
SNARK09.57 

The  head  phantom  that  was  used  for  data  generation  is 
based  on  an  actual  cross  section  of  the  human  head.  It  is  de¬ 
scribed  as  a  collection  of  geometrical  objects  (such  as  ellipses, 
triangles,  and  segments  of  circles)  whose  combination  accu¬ 
rately  resembles  the  anatomical  features  of  the  actual  head 
cross  section.  In  addition,  the  basic  phantom  contains  a  large 
tumor.  The  actual  phantom  used  was  obtained  by  a  random 
variation  of  the  basic  phantom,  by  incorporating  into  it  lo¬ 
cal  inhomogeneities  and  small  low-contrast  tumors  at  ran¬ 
dom  locations.  This  phantom  is  represented  by  the  image  in 
Fig.  1(a).  That  image  comprises  485  x  485  pixels  each  of  size 
0.376  mm  by  0.376  mm.  The  values  assigned  to  the  pixels  are 
obtained  by  an  1 1  x  11  subsampling  of  the  pixels  and  aver¬ 
aging  the  values  assigned  to  the  subsamples  by  the  geomet¬ 
rical  objects  that  are  used  to  describe  the  anatomical  features 
and  the  tumors.  Those  values  are  approximate  linear  atten¬ 
uation  coefficients  per  cm  at  60  keV  (0.416  for  bone,  0.210 
for  brain,  0.207  for  cerebrospinal  fluid).  The  contrast  of  the 
small  tumors  with  their  background  is  0.003  cm4 .  In  order 
to  clearly  see  the  low-contrast  details  in  the  interior  of  the 
skull,  we  use  zero  (black)  to  represent  the  value  0.204  (or  any¬ 
thing  less)  and  255  (white)  to  represent  0.21675  or  anything 
more).  The  same  is  true  for  all  the  images  in  the  rest  of  this 
paper. 

For  the  selected  head  phantom  we  generated  parallel 
projection  data ,  in  which  one  view  comprises  estimates  of 
integrals  through  the  phantom  for  a  set  of  693  equally  spaced 
parallel  lines  with  a  spacing  of  0.0376  cm  between  them.  (We 
chose  to  simulate  parallel  rather  than  divergent  projection 
data,  since  the  reconstruction  by  the  method  of  Ref.  42  with 
which  we  wish  to  compare  the  superiorization  approach  was 
performed  for  us  by  the  authors  of  Ref.  42  on  parallel  data. 
Even  though  contemporary  CT  scanners  use  divergent  pro¬ 
jection  data,  results  obtained  by  the  use  of  parallel  projection 
data  are  relevant  to  them,  since  it  is  known  that  the  quality  of 
reconstructions  from  these  two  modes  of  data  collection  are 
very  similar  as  long  as  the  data  generations  use  similar  fre¬ 
quencies  of  sampling  of  lines  and  similar  noise  characteristics 
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Fig.  1 .  (a)  A  head  phantom,  (b)  Reconstruction  of  the  head  phantom  from  realistically  simulated  projection  data  for  360  views  using  ART  with  blob  basis 
functions. 


in  the  estimated  integrals  for  those  lines;  see,  for  example,  the 
reconstructions  from  divergent  and  parallel  projection  data  in 
Fig.  5.15  of  Ref.  55.)  In  calculating  these  estimates,  we  take 
into  consideration  the  effects  of  photon  statistics,  detector 
width,  and  scatter.  Details  of  how  we  do  this  exactly  can  be 
found  in  Secs.  5.5  and  5.9  of  Ref.  55.  Briefly,  quantum  noise 
is  calculated  based  on  the  assumption  that  approximately 
2  000  000  photons  enter  the  head  along  each  ray,  detector 
width  is  simulated  by  using  11  subrays  along  each  of 
which  the  attenuation  is  calculated  independently  and  then 
combined  at  the  detector,  and  5%  of  the  photons  get  counted 
not  by  the  detector  for  the  ray  in  question  but  detectors  for 
the  neighboring  rays.  For  the  experiments  in  this  paper,  we 
did  not  simulate  the  polyenergetic  nature  of  the  x-ray  source. 


To  indicate  what  can  be  achieved  in  clinical  CT,  we  show  in 
Fig.  1(b)  a  reconstruction  that  was  made  from  data  comprising 
of  360  such  views  with  the  reconstruction  algorithm  known 
as  ART  with  blob  basis  functions;  see  Chap.  1 1  of  Ref.  55. 

III.C.  Superiorization  reconstruction  from  a  few  views 

The  main  reason  in  the  literature  for  advocating  the  use  of 
TV  as  the  optimization  criterion  is  that  by  doing  so  one  can 
achieve  efficacious  reconstructions  even  from  sparsely  sam¬ 
pled  data.  In  our  own  work’1  with  realistically  simulated  CT 
data,  we  found  that  this  is  not  always  the  case  and  this  will  be 
demonstrated  again  by  the  experiments  reported  in  the  current 
paper. 


Fig.  2.  Reconstructions  using  TV  as  the  optimization  criterion  from  realistically  simulated  projection  data  for  60  views  using  (a)  ASD-POCS  and  (b)  supe¬ 
riorization.  As  compared  to  Fig.  1(b),  these  reconstructions  fail  in  two  ways:  they  do  not  show  some  of  the  fine  details  in  the  phantom  and  they  present  some 
artifactual  variations.  The  former  of  these  is  a  consequence  of  reconstructing  from  a  much  smaller  dataset  than  used  for  Fig.  1(b).  The  latter  is  due  to  using  a 
very  narrow  window  (13.5  HU)  in  these  displays.  Were  we  to  use  a  wider  display  window  (e.g.,  from  -4-29  HU  to  429  HU)  for  the  reconstructions  in  this  figure 
and  in  Fig.  1(b),  the  visual  appearance  of  the  resulting  images  would  be  nearly  indistinguishable. 
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There  have  appeared  in  the  literature  some  approaches  to 
T  V  minimization  that  seem  to  indicate  a  more  efficacious  per¬ 
formance  for  CT  than  the  one  reported  in  Ref.  31.  One  of 
these  is  the  adaptive  steepest  descent  projections  onto  convex 
sets  (ASD-POCS)  algorithm,  which  is  described  in  detail  in 
the  much-cited  paper  of  Sidky  and  Pan42  and  whose  use  has 
been  since  reported  in  a  number  of  subsequent  publications, 
for  example,  in  Refs.  23  and  43.  We  note  that  ASD-POCS 
was  designed  with  the  aim  of  producing  an  exact  minimiza¬ 
tion  algorithm,  in  contrast  to  our  heuristic  superiorization  ap¬ 
proach.  Translating  Eqs.  (6)— (8)  of  Ref.  42  into  our  termi¬ 
nology,  the  aim  of  ASD-POCS  is  the  following:  Given  an 
s  e  M+,  find  an  e -compatible  x  e  £2  =  R7  for  which  TV(x) 
is  minimal.  [Note  that  this  aim  is  a  special  case  of  the  con¬ 
strained  optimization  formulation  presented  in  Eq.  (10).]  In 
order  to  test  ASD-POCS,  we  generated  realistic  projection 
data  as  described  in  Subsection  III.B  but  for  only  60  views 
at  3°  increments  with  the  spacing  between  the  lines  for  which 
integrals  are  estimated  set  at  0.752  mm.  Thus  the  number  of 
rays  (and  hence  the  number  photons  put  into  the  head)  in  this 
dataset  is  a  12th  of  what  it  is  in  the  dataset  used  to  produce 
the  reconstruction  in  Fig.  1(b).  A  reconstruction  from  these 
data  was  produced  for  us  using  ASD-POCS  by  the  authors  of 
Ref.  42  (this  ensured  that  it  does  not  suffer  due  to  our  misinter¬ 
pretation  of  the  algorithm  or  from  our  inappropriate  choices 
of  the  free  parameters),  it  is  shown  in  Fig.  2(a). 

Since  the  image  quality  of  Fig.  2(a)  is  not  anywhere  near 
to  that  of  Fig.  1(b),  we  present  here  a  brief  discussion  as  to 
why  we  are  showing  such  images.  Many  publications  in  the 
recent  medical  imaging  literature  have  claimed  that  medically 
efficacious  reconstructions  can  be  obtained  by  the  use  of  TV  - 
minimization  from  data  as  sparse  as  what  was  used  to  produce 
Fig.  2(a).  (In  fact,  ASD-POCS  was  motivated  and  used  with 
such  an  aim  in  mind.23'42,43)  Such  publications  usually  show 
reconstructions  from  sparse  data  as  evidence  for  the  validity 
of  their  claims.  They  can  do  this  because  in  their  presented 
illustrations  the  features  that  are  observable  in  the  reconstruc¬ 
tions  are  usually  much  larger  and/or  of  much  higher  contrast 
against  their  backgrounds  than  the  small  “tumors”  in  Fig.  1  (a), 
which  are  perfectly  visible  in  the  reconstruction  in  Fig.  1(b), 
but  are  not  detectable  in  the  reconstruction  from  sparse  data 
in  Fig.  2(a).  The  reason  why  that  reconstruction  appears  to  be 
unacceptably  bad  is  that  the  display  window  (from  0.204  cnT1 
linear  attenuation  coefficient  to  0.21675  cm4  linear  attenua¬ 
tion  coefficient)  is  very  narrow;  it  was  selected  to  enhance 
the  visibility  of  the  small  low-contrast  tumors.  The  width  of 
this  window  corresponds  to  about  13.5  Hounsfield  units  (HU). 
As  compared  to  this,  in  their  evaluation  of  sparse-view  recon¬ 
struction  from  flat-panel-detector  cone-beam  CT,  Bian  et  a/.43 
use  what  they  call  a  “soft-tissue  grayscale  window”  (also  a 
“narrow  window”)  from  -429  HU  to  429  HU  to  display  head 
phantom  reconstructions.  Using  such  a  window  for  our  re¬ 
constructions  shown  Figs.  2(a)  and  1(b)  would  result  in  im¬ 
ages  that  are  nearly  indistinguishable  from  each  other.  Thus 
reporting  the  images  using  such  a  display  window  is  consis¬ 
tent  with  the  claim  that  a  TV-minimizing  reconstruction  from 
a  few  views  is  similar  in  quality  to  a  more  traditional  recon¬ 
struction  from  many  views.  However,  our  much  narrower  dis¬ 


play  window  reveals  that  this  is  not  really  so.  We  therefore 
continue  using  our  much  narrower  window  in  what  follows, 
since  it  clearly  reveals  the  nature  of  the  reconstructions  being 
compared,  warts  and  all. 

While  this  ASD-POCS  reconstruction  is  not  as  good  as  it 
should  be  for  diagnostic  CT  of  the  brain  (due  to  the  sparsity 
of  the  data),  it  is  visually  better  than  the  reconstruction  using 
superiorization  from  similar  data  as  reported  in  Ref.  31.  We 
discuss  the  reasons  for  this  in  Subsection  III.D.  Here,  we  con¬ 
centrate  on  examining  whether  one  can  achieve  a  reconstruc¬ 
tion  using  superiorization  that  is  as  good  as  that  produced  by 
ASD-POCS  from  the  same  data. 

For  this  we  first  need  to  examine  the  numerical  properties 
of  the  ASD-POCS  reconstruction.  This  reconstruction  uses 
485  x  485  pixels  each  of  size  0.376  mm  by  0.376  mm.  This 
implies  that  J  —  235,225  and  it  also  determines  the  compo¬ 
nents  of  the  vectors  a1  e  IR7  in  the  precise  specification  of 
the  problem  S.  The  Res$,  as  defined  by  Eq.  (2),  of  the  ASD- 
POCS  reconstruction  is  0.33  and  the  TV,  as  defined  by  Eq. 
(12),  is  835. 

We  applied  to  the  same  problem  S  a  superiorized  version 
of  the  algorithm  R  defined  by  Eq.  (3).  To  complete  the  spec¬ 
ification  of  R,  we  point  out  that  for  the  ordering  of  views  we 
chose  the  “efficient”  one  that  was  introduced  in  Ref.  58  and 
is  also  discussed  on  p.  209  of  Ref.  55.  The  choices  we  made 
for  the  superiorization  are  the  following:  yi  =  0.99995 f  ,  x 
is  the  zero  vector,  and  N  =  20.  The  nonascending  vector  was 
computed  by  the  method  described  in  the  paragraph  below 
[Eq.  (12)].  Denoting  by  Rs  the  infinite  sequence  of  points  in 

that  is  produced  by  the  superiorized  version  of  the  algo¬ 
rithm  R  when  applied  to  the  problem  S,  we  chose  as  our  re¬ 
construction  jc*  =  0(S.  0.33,  Rs).  For  such  a  reconstruction 
we  have,  by  the  definition  of  O,  that  Ress(x*)  <  0.33;  in  other 
words,  the  output  of  the  superiorization  algorithm  is  at  least 
as  constraints-compatible  with  S  as  the  output  of  ASD-POCS. 
From  the  point  of  view  of  T  V -minimization,  our  x*  is  slightly 
better:  TV(x*)  =  826. 

The  superiorization  reconstruction  is  displayed  in 
Fig.  2(b).  Visually,  it  is  similar  to  the  reconstruction  produced 
by  ASD-POCS.  From  the  optimization  point  of  view  it 
achieves  the  desired  aim  better  than  ASD-POCS  does,  since 
it  results  in  smaller  values  for  both  Ress  and  for  TV,  even 
though  only  slightly. 

That  the  two  reconstructions  in  Fig.  2  are  very  similar  is 
not  surprising  because  a  comparison  of  the  pseudocodes  re¬ 
veals  that  the  ASD-POCS  algorithm  in  Ref.  42  is  essentially  a 
special  case  of  the  Superiorized  Version  of  Algorithm  P,  even 
though  it  has  been  derived  from  rather  different  principles.  To 
obtain  the  ASD-POCS  algorithm  from  our  methodology  de¬ 
scribed  here,  we  would  have  to  choose  ART  (see  Chap.  11 
of  Ref.  55)  as  the  algorithm  that  we  are  superiorizing.  Such 
a  superiorization  of  ART  was  reported  in  the  earliest  paper 
on  superiorization.27  For  the  illustration  in  our  current  paper, 
we  decided  to  superiorize  the  block-iterative  algorithm  R  de¬ 
fined  by  Eq.  (3).  This  illustrates  the  generality  of  the  superi¬ 
orization  approach:  it  is  applicable  not  only  to  a  large  class 
of  constrained  optimization  problems,  but  also  enables  the 
use  of  any  of  a  large  class  of  iterative  algorithms  designed  to 
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produce  a  constraints-compatible  solutions.  A  recent  publica¬ 
tion  aimed  at  producing  an  exact  TV- minimizing  algorithm 
based  on  the  block-iterative  approach  is  Ref.  44. 

III.D.  Effects  of  variations  in  the  reconstruction 
approach 

The  reconstruction  in  Fig.  2(a)  produced  by  ASD-POCS 
definitely  “looks  better”  than  a  reconstruction  in  Ref.  31, 
which  was  obtained  using  superiorization  from  similar  data. 
Since,  as  discussed  in  the  last  paragraph  of  Subsection  III.C, 
the  ASD-POCS  algorithm  in  Ref.  42  can  be  obtained  as  a  spe¬ 
cial  case  of  superiorization,  it  must  be  that  some  of  the  choices 
made  in  the  details  of  the  implementations  are  responsible  for 
the  visual  differences.  An  analysis  of  the  implementational 
details  adopted  by  the  two  approaches  revealed  several  differ¬ 
ences.  After  removing  these  differences,  the  superiorization 
approach  produced  the  image  in  Fig.  2(b),  which  is  very  sim¬ 
ilar  to  the  reconstruction  produced  by  ASD-POCS.  We  now 
list  the  implementational  choices  that  were  made  for  superi¬ 
orization  to  make  its  performance  match  that  of  the  reported 
implementation  of  ASD-POCS. 

One  implementational  difference  is  in  the  stopping-rule  of 
the  iterative  algorithm;  that  is,  the  choice  of  e  in  determin¬ 
ing  the  output  0(S,  s,  Rs )■  Since  the  data  are  noisy,  the  phan¬ 
tom  itself  does  not  match  the  data  exactly.  In  previously  re¬ 
ported  implementations  of  superiorization  it  was  assumed  that 
the  iterative  process  should  terminate  when  an  image  is  ob¬ 
tained  that  is  approximately  as  constraints-compatible  as  the 
phantom;  in  the  case  of  the  phantom  and  the  projections  data 
on  which  we  report  here  the  value  of  Res$  for  the  phantom 
is  approximately  0.91,  which  is  larger  than  its  value  (0.33) 
for  the  reconstruction  produced  by  ASD-POCS.  The  output 
0(S,  0.91,  Rs)  is  shown  in  Fig.  3(a).  This  is  a  wonderfully 
smooth  reconstruction,  its  TV  value  is  only  771.  However, 
this  smoothness  comes  at  a  price:  we  lose  not  only  the  abil¬ 
ity  to  detect  the  large  tumor,  but  we  cannot  even  see  anatomic 
features  (such  as  the  ventricular  cavities)  inside  the  brain.  So 
it  appears  that,  in  order  to  see  medically  relevant  features  in 
the  brain,  overfitting  (in  the  sense  of  producing  a  reconstruc¬ 
tion  from  noisy  data  that  is  more  constraints-compatible  than 
the  phantom)  is  desirable. 

In  the  implementations  that  produced  previously  reported 
reconstructions  by  superiorization,  the  number  N  in  the  Supe- 
riorized  Version  of  Algorithm  P  was  always  chosen  to  be  1 . 
It  is  possible  that  this  is  the  wrong  choice,  making  only  this 
change  to  what  lead  to  the  reconstruction  in  Fig.  2(b)  results 
in  the  reconstruction  shown  in  Fig.  3(b).  That  image  appears 
similar  to  the  image  in  Fig.  2(b),  but  it  has  a  higher  TV  value, 
namely,  832,  which  is  still  very  slightly  lower  than  that  of  the 
ASD-POCS  reconstruction.  The  choice  N  =  20  was  based  on 
the  desire  to  maintain  consistency  with  what  has  been  prac¬ 
ticed  using  ASD-POCS,  see  p.  4790  of  Ref.  42.  It  appears  that 
in  the  context  of  our  paper  the  additional  computing  cost  due 
to  choosing  A  to  be  20  rather  than  1  is  not  really  justified.  (We 
note  that  if  d  is  selected  using  subgradients  as  discussed  in  the 
paragraph  following  Eq.  (7)  and  thus  d  is  not  guaranteed  to  be 
a  nonascending  vector  for  the  T  V  function,  then  the  choice  of 


20  rather  than  1  for  N  results  in  a  considerable  improvement. 
However,  an  even  greater  improvement  is  achieved  even  with 
N  —  1  by  selecting  d  as  recommended  in  this  paper.) 

Another  important  difference  between  the  ASD-POCS  im¬ 
plementation  and  the  previous  implementations  of  the  superi¬ 
orization  approach  is  the  size  of  the  pixels  in  the  reconstruc¬ 
tions.  For  the  ASD-POCS  reconstruction  this  was  selected  to 
be  0.376  mm  by  0.376  mm.  In  previously  reported  reconstruc¬ 
tions  by  superiorization  it  was  assumed  that  the  edge  of  a 
pixel  should  be  the  same  as  the  distance  between  the  paral¬ 
lel  lines  along  which  the  data  are  collected;  that  is,  0.752  mm 
for  our  problem  S.  This  assumption  proved  to  be  false.  TV - 
minimization  takes  care  of  undesirable  artifacts  that  may  oth¬ 
erwise  arise  due  to  the  smaller  pixels  and  this  leads  to  a  visual 
improvement.  A  superiorizing  reconstruction  with  the  larger 
pixels,  using  e  =  0.33  and  N  =  20,  is  shown  in  Fig.  3(c). 
(We  note  that  the  use  of  smaller  pixels  during  iterative  x-ray 
CT  reconstructions  was  also  suggested  in  Ref.  59.  However, 
that  approach  is  quite  different  from  what  is  presented  here: 
its  final  result  uses  larger  pixels  whose  values  are  obtained  by 
averaging  assemblies  of  values  provided  by  the  iterative  pro¬ 
cess  to  the  smaller  pixels.  There  is  no  such  downsampling  in 
our  approach,  our  final  result  is  presented  using  the  smaller 
pixels.  Its  smoothness  is  due  to  reduction  of  7V  by  the  supe¬ 
riorization  approach  rather  than  to  averaging  pixel  values  in  a 
denser  digitization.) 

Combining  the  use  of  the  larger  pixels  with  e  =  0.91  and 
N  =  1  results  in  the  reconstruction  shown  in  Fig.  3(d).  This 
reconstruction,  for  which  the  superiorization  options  were  se¬ 
lected  according  to  what  was  done  in  Ref.  31,  is  visually 
inferior  to  those  shown  in  our  Fig.  2.  The  reconstructions 
displayed  in  Fig.  3  also  illustrate  another  important  point, 
namely,  that  even  though  the  mathematical  results  discussed 
in  this  paper  are  valid  for  a  large  range  of  choices  of  the  pa¬ 
rameters  in  the  superiorization  algorithms,  for  medical  effi¬ 
cacy  of  the  reconstructions  attention  has  to  be  paid  to  these 
choices  since  they  can  have  a  drastic  effect  on  the  quality  of 
the  reconstruction. 

It  has  been  mentioned  in  Subsection  II.B  that  except  for 
the  presence  of  Q  in  Eq.  (3),  which  enforces  non-negativity 
of  the  components,  R  is  identical  to  the  algorithm  used  and 
illustrated  in  Ref.  31.  It  is  known  that  CT  reconstruction  of 
the  brain  from  many  views  does  not  suffer  from  ignoring 
the  fact  that  the  components  of  the  jc,  which  represent  linear 
attenuation  coefficients,  should  be  non-negative;  as  is  illus¬ 
trated  in  Fig.  1(b).  This  remains  so  when  reconstructing  from 
a  few  views  using  the  method  and  data  that  we  have  been  dis¬ 
cussing:  if  we  do  everything  in  exactly  the  same  way  as  was 
done  to  obtain  the  reconstruction  with  T  V  value  826  that  is 
shown  in  our  Fig.  2(b)  but  remove  Q  from  Eq.  (3),  then  we 
obtain  a  reconstruction  in  Fig.  4(a)  whose  TV  value  is  829. 

Another  variation  that  deserves  discussion,  because  it  has 
been  suggested  in  the  literature,22  is  one  that  does  not  come 
about  by  making  choices  for  the  general  approach  of  the  Su- 
periorized  Version  of  Algorithm  P  but  rather  by  changing  the 
nature  of  the  approach.  The  variation  in  question  is  not  appli¬ 
cable  in  general,  but  can  be  applied  to  the  special  case  when 
the  algorithm  to  be  superiorized  is  the  R  defined  by  Eq.  (3).  It 
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(c)  (d) 


Fig.  3.  Reconstructions  produced  by  varying  some  of  the  parameters  in  the  algorithm  that  produced  Fig.  2(b).  (a)  Changing  the  termination  criterion  form 
s  =  0.33  to  s  =  0.91.  (b)  Changing  the  value  of  N  from  20  to  1.  (c)  Reconstructing  with  pixel  size  0.752  mm  by  0.752  mm  instead  of  0.376  mm  by  0.376  mm. 
(d)  Reconstructing  with  all  the  three  changes  of  (a)-(c). 


Fig.  4.  Reconstructions  by  variations  that  do  not  fit  into  the  framework  within  which  the  previously  shown  reconstructions  were  produced,  (a)  Not  using 
non-negativity  in  the  algorithm,  (b)  Interleaving  perturbations  with  blocks. 
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was  suggested  as  an  improvement  to  the  approach  presented 
above  with  the  choice  N  =  1 .  The  idea  was  based  on  recog¬ 
nizing  the  block-iterative  nature  of  the  algorithmic  operator 
R  s  in  Eq.  (3)  and  intermingling  the  perturbation  steps  of  lines 
(vii)-(xvii)  of  the  Superiorized  Version  of  Algorithm  R  with 
the  projection  steps  B.v, ,  . . . ,  Bsff  of  Eq.  (3).  It  was  reported 
in  Ref.  22  that  doing  this  is  advantageous  to  using  the  Supe¬ 
riorized  Version  of  Algorithm  R.  However,  when  we  applied 
the  variation  of  the  Superiorized  Version  of  Algorithm  R  that 
is  proposed  in  Ref.  22  to  the  problem  S  that  we  have  been 
using  in  this  section,  we  ended  up  with  the  reconstruction  in 
Fig.  4(b)  whose  TV  value  is  920.  This  is  not  as  good  as  what 
was  obtained  using  the  version  of  the  algorithm  that  produced 
the  reconstruction  in  Fig.  2(b).  We  conclude  that  the  variation 
suggested  by  Ref.  22,  which  does  not  fit  into  the  theory  of  our 
paper,  does  not  have  an  advantage  over  what  we  are  proposing 
here,  at  least  for  the  problem  S  that  we  have  been  discussing  in 
this  section.  We  conjecture  that  the  improvement  reported  in 
Ref.  22  is  due  to  selecting  d  using  subgradients  as  discussed 
in  the  paragraph  following  Eq.  (7)  and,  as  discussed  earlier, 
such  an  improvement  is  not  obtained  if  d  is  selected  by  the 
more  appropriate  method  recommended  in  this  paper. 


IV.  DISCUSSION  AND  CONCLUSIONS 

Constrained  optimization  is  an  often-used  tool  in  medical 
physics.  The  methodology  of  superiorization  is  a  heuristic  (as 
opposed  to  exact)  approach  to  constrained  optimization. 

Although  the  idea  of  superiorization  was  introduced  in 
2007  and  its  practical  use  has  been  demonstrated  in  several 
publications  since,  this  paper  is  the  first  to  provide  a  solid 
mathematical  foundation  to  superiorization  as  applied  to  the 
noisy  problems  of  the  real  world.  These  foundations  include  a 
precise  definition  of  constraints-compatibility,  the  concept  of 
a  strongly  perturbation  resilient  algorithm,  simple  conditions 
that  ensure  that  an  algorithm  is  strongly  perturbation  resilient, 
the  superiorized  version  of  an  algorithm  and  the  showing  that 
the  superiorized  version  of  a  strongly  perturbation  resilient 
algorithm  produces  outputs  that  are  essentially  as  constraints- 
compatible  as  those  produced  by  the  original  version  but  are 
likely  to  have  a  smaller  value  of  the  chosen  optimization  cri¬ 
terion. 

The  approach  is  very  general.  For  any  iterative  algorithm 
P  and  for  any  optimization  criterion  0  for  which  we  know 
how  to  produce  nonascending  vectors,  the  pseudocode  given 
in  Subsection  II. E  automatically  provides  the  version  of  P  that 
is  superiorized  for  0. 

We  demonstrated  superiorization  for  tomography  when  to¬ 
tal  variation  is  used  as  the  optimization  criterion.  In  particu¬ 
lar,  we  illustrated  on  a  particular  tomography  problem  that,  in 
spite  of  its  generality,  superiorization  produced  a  reconstruc¬ 
tion  that  is  as  good  as  (from  the  points  of  view  of  constraints- 
compatibility  and  T  V -minimization)  what  was  obtained  by 
the  ASD-POCS  algorithm  that  was  specially  designed  for 
T  V -minimization  in  tomography. 
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APPENDIX:  MATHEMATICAL  PROOFS 
1.  Conditions  for  strong  perturbation  resilience 

Theorem  1.  Let  P  be  an  algorithm  for  a  problem  structure 
(T,  Vr)  such  that,  for  all  T  e  T,  P  is  boundedly  convergent 
for  T,  VrT  :  Q  — >  E  is  uniformly  continuous,  and  P7  :  A 
— >  £2  is  nonexpansive.  Then  P  is  strongly  perturbation  re¬ 
silient. 

Proof.  We  first  show  that  there  exists  an  s  e  R+  such 
that  0(T,  s,  ((Py-/jc)jR0)  is  defined  for  every  x  e  f2.  Un¬ 
der  the  assumptions  of  the  theorem,  let  y  e  M  +  be  such 
that  Vrx(y( jc))  <  y,  for  every  red.  We  prove  that 
0(T,  2 y,  ((Pr)kx)/?Lo)  's  defined  for  every  x  e  L!  as  follows. 
Select  a  particular  x  e  By  uniform  continuity  of  VrT, 
there  exists  a  S  >  0,  such  that  \VrT(z)  —  VrT(y(x))\  <  y, 
for  any  z  e  £2  for  which  ||z  —  y(x)||  <  S.  Since  P  is  conver¬ 
gent  for  7',  there  exists  a  non-negative  integer  K ,  such  that 
|| (P7')^x  —  y(x)||  <  8.  It  follows  that 

\VrT(fPT)Kx)\  <  \VrT((PT)Kx)-VrT(y(x))\  +  \VrT(y(x))\ 

<  2 y.  (Al) 

Now  let  7eT  and  e  e  R+  be  such  that  0(T,  e, 
((P;r)Ax)£i0)  is  defined  for  every  x  e  £2.  To  prove  the  theo¬ 
rem,  we  need  to  show  that  0{T,  s',  R)  is  defined  for  every  s' 
>  s  and  for  every  sequence  R  —  (xk  )jR0  of  points  in  G  for 
which,  for  all  k  >  0,  Eq.  (6)  is  satisfied  for  bounded  perturba¬ 
tions  fikVk  •  Let  s'  and  R  satisfy  the  conditions  of  the  previous 
sentence. 

For  k  >  0,  we  have,  due  to  the  nonexpansiveness  of  P7, 
that 

||x*+I  -PrxA||  =  \\PT(xk  +  pkvk)-PTxk\\  <  \\pkvk\\. 

(A2) 

Denote  ||/3,tt>A||  by  rk.  Clearly,  rk  e  K+  and  it  follows  from  the 

EOO 

rk  <  oo. 

We  next  prove  by  induction  that,  for  every  pair  of  non¬ 
negative  integers  k  and  i, 

k+i- 1 

||**+i-(Pry**||<  £  rj.  (A3) 

i=k 
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Let  k  be  an  arbitrary  non-negative  integer.  If  i  =  0,  then 
the  value  is  zero  on  both  sides  of  the  inequality  and  hence 
Eq.  (A3)  holds.  Now  assume  that  Eq.  (A3)  holds  for  an  integer 
i  >  0.  Then,  by  Eq.  (A2)  and  the  nonexpansiveness  of  Py, 

||jci'+i+1  -  (Pry+1x*||  <  \\xk+i+l  -Prjc^+!'|| 

— 1 1  p  -  (pry+ljc*n 

<  rk+i  +  ||*i+i  -  (Pr)''**|| 

k+i-l 

<  rk+i  +  Y  n 

j=k 

k+i 

=  Yrn  (A4) 

j=k 

which  completes  our  inductive  proof.  A  consequence  of 
Eq.  (A3)  is  that,  for  every  pair  of  non-negative  integers  k  and 

1, 

OO 

iix*+i'  -  (pr)i**n  <YrJ-  (A5) 

j=k 

Due  to  the  summability  of  the  non-negative  sequence 
(rk)Z0,  the  right-hand  side  (and  hence  the  left-hand  side)  of 
this  inequality  gets  arbitrarily  close  to  zero  as  k  increases. 

Since  Vrr  is  uniformly  continuous,  there  exists  a  8 
such  that,  for  all  x,  y  e  £2,  \Vrjix)  —  Vrj(y)\  <  s'  —  s  pro¬ 
vided  that  || jc  —  _y  ||  <  <5.  Select  a  A:  so  that  '}2JLk  rj  <  <5-  By 
the  assumption  that  0(T,  s,  ((Pr)**)^)  is  defined  for  ev¬ 
ery  x  e  L!,  there  exists  a  non-negative  integer  i  for  which 
Vr((PTyxk)  <  s.  From  Eq.  (A5)  we  have,  for  this  k  and  i, 
that  ||jca+'  —  (Pr)'jc<r||  <  8  and,  hence, 

\VrT{xk+i)\  <  \VrT(xk+i)-VrT((FTyxk)\ 
+\VrT((PTyxk)\ 

<  (s'  —  e)  +  s  =  s',  (A6) 

proving  that  0(T,  s’,  R)  is  defined.  □ 

2.  Nonascending  vectors  for  convex  functions 

Theorem  2:  Let  </>  :  R7  ->  R  be  a  convex  function  and  let 
relJ.  Let  g  e  R7  satisfy  the  property:  For  1  <j<J,  if  the 
jth  component  gj  of  g  is  not  zero,  then  the  partial  derivative 
|^-(jc)  of  <[>  at  x  exists  and  its  value  is  gj.  Define  d  to  be  the 
zero  vector  if  ||g||  =0  and  to  be  —  g/||g||  otherwise.  Then  d 
is  a  nonascending  vector  for  (p  at  x. 

Proof:  The  theorem  is  trivially  true  if  ||g||  =  0,  so  we  as¬ 
sume  that  this  is  not  the  case.  We  denote  by  I  the  nonempty 
set  of  those  indices  j  for  which  gj  f-  0. 

For  1  <j<J,  let  sj  be  gj/\gj\  for  j  e  I  and  be  0  otherwise, 
and  let  e1  e  R7  be  the  vector  all  of  whose  components  are 
zero  except  for  the  /th.  which  is  one.  Then,  for  1  <  j  <  J, 
there  exists  a  8,  >  0  such  that,  for  0  <  a;  <  Sj, 

<p(x  —  XjSjej)  <  <p(x).  (A7) 

This  is  obvious  if  sj  =  0.  Otherwise,  p^(x)  exists  and  in¬ 
dicates  f  increases  at  x  if  Sj  —  1  or  that  </>  decreases  at  x  if  sj 
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=  —  1 .  The  existence  of  the  desired  <5,  can  be  derived  from  the 
standard  definition  of  the  partial  derivative  as  a  limit. 

We  define  8  >  0  by 


8  = 


mm 


J  j€l 


I  Si 


Then  we  have  that,  for  0  <  X  <  8, 

j 


(p(x  +  Xd)  —  <p  I  x  —  x'Yj 


SteJ 


j=  i 


ffll 


j= i 


i^-i  j 


<  ~  y#(*) 


j= i 
=  <K*)- 


(A8) 


(A9) 


The  first  inequality  above  follows  from  the  convexity  of  </> 
and  the  second  one  follows  from  Eq.  (A7),  with  Xj  defined  to 
be  combined  with  Eq.  (A8).  Thus  d  is  a  nonascending 

vector  for  0  at  x.  □ 


a) Author  to  whom  correspondence  should  be  addressed.  Electronic 
mail:  gabortherman@yahoo.com;  URL:  http://www.dig.cs.gc.cuny.edu/ 
gabor/index  .html . 
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Abstract  The  projected  subgradient  method  for  constrained  minimization  repeat¬ 
edly  interlaces  subgradient  steps  for  objective  function  descent  with  projections  onto 
the  feasible  region,  which  is  the  intersection  of  closed  convex  constraints  sets,  to  regain 
feasibility.  The  latter  pose  a  computational  difficulty  and  therefore  the  projected  sub¬ 
gradient  method  is  applicable  only  when  the  feasible  region  is  “simple  to  project  onto”. 
In  contrast  with  this,  in  the  superiorization  methodology  a  feasibility-seeking  algorithm 
leads  the  overall  process  and  objective  function  descent  steps  are  interlaced  into  it.  This 
makes  a  difference  because  the  feasibility-seeking  algorithm  employs  projections  onto 
the  individual  constraints  sets  and  not  onto  the  whole  feasible  region. 

We  present  the  two  approaches  side-by-side  and  demonstrate  their  performance  on 
a  problem  of  computerized  tomography  image  reconstruction  posed  as  a  constrained 
minimization  problem  aiming  at  finding  a  constraint-compatible  solution  that  has  a 
reduced  value  of  the  total  variation  of  the  reconstructed  image. 

Keywords  constrained  minimization,  feasibility-seeking,  bounded  convergence, 
superiorization,  projected  subgradient  method,  proximity  function,  strong  perturbation 
resilience,  image  reconstruction,  computerized  tomography 

PACS  65K05,  90C59,  65B99,  49M30,  90C90,  90C30 


1  Introduction 

Our  aim  in  this  paper  is  to  expose  the  recently-developed  superiorization  methodol¬ 
ogy  (SM)  and  its  ideas  to  the  optimization  community  by  “confronting”  it  with  the 
projected  subgradient  method  (PSM).  The  reasons  for  this  choice  are  explained  below. 
Throughout  this  paper  we  assume  that  f?  is  a  nonempty  subset  of  the  J-dimensional 
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Euclidean  space  RJ .  We  consider  constrained  minimization  problems  of  the  form 

minimize  {<j>{x)  \  x  £  C}  ,  (1) 

where  <f>  :  R:J  — >  R  is  an  objective  function  and  C  C  Q  is  a  given  feasible  set.  Such 
nonlinear  constrained  optimization  problems  lie  at  the  heart  of  optimization  theory 
and  practice  and  constitute  mathematical  models  for  many  scientific  and  real-world 
applications.  Various  theories  and  different  methods  abound  for  fulfilling  the  task  of 
finding  a  point  that  is  both  feasible  for  the  constraints  and  renders  a  minimal  value 
to  the  objective  function.  Some  approaches,  known  as  regularization  methods,  fold  the 
feasibility-seeking  part  of  the  overall  task  into  the  minimization  via  penalty  functions. 
Others  assume  a  feasible  point  as  a  starting  point  and  conduct  the  search  for  minimality 
while  preventing  the  iterates  of  an  algorithm  from  leaving  the  constraints  by  using 
barrier  functions.  Still  others  deal  with  feasibility-seeking  and  the  search  for  minimality 
as  two  separate  tasks,  see,  e.g.,  [19]. 

In  this  paper  we  juxtapose  the  projected  subgradient  method  (PSM)  [sometimes 
referred  to  also  as  the  {sub) gradient  projection  method]  with  the  recently-developed 
superiorization  methodology  (SM)  and  demonstrate  their  performance  on  a  large-size 
real-world  application  that  is  modeled,  and  needs  to  be  solved,  as  a  constrained  min¬ 
imization  problem.  It  is  not  claimed  that  PSM  is  the  best  optimization  method  for 
solving  (1)  and  there  are  many  different  alternative  methods  to  which  SM  could  be 
compared.  So  why  did  we  chose  to  confront  PSM  with  our  SM?  In  a  nutshell,  our 
answer  is  that  both  methods  interlace  objective-function-reduction  steps  with  steps 
oriented  toward  feasibility,  but  they  differ  in  how  they  restore  or  preserve  feasibility. 
We  now  outline  these  two  methods  and  explain  our  choice  in  more  detail. 

The  PSM  for  constrained  minimization  has  been  extensively  investigated,  see,  e.g., 

[41,  Subsection  7.1.2],  [27,  Subsection  3.2.3].  Its  roots  are  in  the  work  of  Slior  [42]  for 

the  unconstrained  case  and  in  the  work  of  Polyak  [38, 39]  for  the  constrained  case.  More 

recent  work  can  be  found  in,  e.g.,  [5].  In  order  to  apply  the  PSM  to  solving  (1)  we  need 

to  assume  that  C  is  a  nonempty  closed  convex  set  and  that  0  is  a  convex  function. 

PSM  generates  a  sequence  of  iterates  •!  xk  >  according  to  the  recursion  formula 

1  J  k= o 

Xk+1  =  Pc  (xk  -  tk4>  ,  (2) 

where  tk  >  0  is  a  step-size,  <j>  £  d(f>  (xk ^  is  a  subgradient  of  <j>  at  xk,  and 

Pq  stands  for  the  orthogonal  (least  Euclidean  norm)  projection  onto  the  set  C.  The 
underlying  philosophy  is  to  perform  unconstrained  objective  function  descent  steps  via 
qk  :=  xk  —  tk(j>'  and  regain  feasibility  with  respect  to  C  after  each  such  step  by 

projecting  qk  onto  C.  Many  published  studies  give  various  sets  of  conditions  under 
which  sequences  generated  by  an  algorithm  that  includes  an  iterative  step  like  (2) 
converge  or  have  certain  other  desirable  properties  and  a  great  deal  of  experimental 
work  has  been  done  and  published  about  applying  such  methods  computationally. 

A  major  difficulty  with  (2)  is  the  need  to  perform,  within  each  iterative  step, 
the  orthogonal  projection.  If  the  feasible  set  C  is  not  “simple  to  project  onto”  then 
the  projection  requires  an  independent  inner-loop  calculation  to  minimize  the  distance 
from  the  point  qk  to  the  set  C,  which  can  be  costly  and  hamper  the  overall  effectiveness 
of  an  algorithm  that  uses  (2).  Also,  if  the  inner  loop  converges  to  the  projection  onto 
C  only  in  the  limit,  then  in  practical  implementations  it  will  have  to  be  stopped  after 
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a  finite  number  of  steps  and  so  xk+1  will  be  only  an  approximation  to  the  projection 
onto  C  and  it  could  even  happen  that  it  is  not  in  C. 

Even  if  we  set  aside  our  worries  about  projecting  onto  C  in  (2),  there  are  still  two 
concerns  when  applying  PSM  to  real-world  problems.  One  is  that  the  iterative  process 
usually  converges  to  the  desired  solution  only  in  the  limit.  In  practice,  some  stopping 
rule  is  applied  to  terminate  the  process  and  the  output  at  that  time  may  not  even  be 
in  C  and,  even  if  it  is  in  C,  it  is  most  unlikely  to  be  the  minimizer  of  <j>  over  C.  The 
second  problem  in  real-world  applications  comes  from  the  fact  that  the  constraints, 
derived  from  the  real-world  problem,  may  not  be  consistent  (e.g. ,  because  they  come 
from  noisy  measurements)  and  so  C  is  empty. 

Both  of  these  objections  can  be  handled  by  replacing  the  notion  of  a  fixed  feasible 
set  C  by  that  of  a  nonnegative  real- valued  proximity  function  Proxc  ■  O  —*  R+-  This 
function  serves  as  an  indicator  of  how  incompatible  a  vector  x  is  with  the  constraints. 
In  such  a  formulation  the  merit  of  the  actual  output  x  of  any  algorithm  is  indicated 
by  the  smallness  of  the  two  numbers  Proxc  (x)  an<i  4>{ x )•  For  the  formulation  of  (1), 
we  would  define  Proxc  so  that  its  range  is  the  ray  of  nonnegative  real  numbers  with 
Proxc  (x)  =  0  if,  and  only  if,  x  £  C  and  then  the  constrained  minimization  problem  (1) 
is  precisely  that  of  finding  an  x  that  is  a  minimizer  of  rf>(x )  over  {x  \  Proxc(x)  =  0}. 
The  above  discussion  allows  us  to  do  away  with  the  nonemptiness  assumption  and  also 
to  compare  the  merits  of  actual  outputs  of  algorithms  that  only  approximate  the  aim 
of  the  constrained  minimization  problem. 

The  recently  invented  SM  incorporates  the  ideas  of  the  previous  paragraph  in  its 
very  foundation  and  formulates  the  problem  with  the  function  Proxc  instead  of  the 
set  C.  The  underlying  idea  of  SM  is  that  many  iterative  algorithms  that  produce 
outputs  x  for  which  ProxcyX)  is  small  are  strongly  perturbation  resilient  in  the  sense 
that,  even  if  certain  kinds  of  changes  are  made  at  the  end  of  each  iterative  step, 
the  algorithm  still  produces  an  output  x'  for  which  Proxc ( x')  is  not  larger.  This 
property  is  exploited  bv  using  permitted  changes  to  steer  the  algorithm  to  an  output 
that  has  not  only  a  small  Proxc  value,  but  has  also  a  small  (j>  value.  The  algorithm 
that  incorporates  such  a  steering  process  is  referred  to  as  the  superiorized  version  of  the 
original  iterative  algorithm.  The  main  practical  contribution  of  SM  is  the  automatic 
creation  of  the  superiorized  version,  according  to  a  given  objective  function  </>,  of  just 
about  any  iterative  algorithm  that  aims  at  producing  an  x  for  which  Proxc (x)  's  small. 

Nevertheless,  in  order  to  carry  out  our  comparative  study,  we  restrict  our  atten¬ 
tion  here  to  a  subset  of  all  possible  problems  to  which  not  only  SM  but  also  PSM  is 
applicable.  We  assume  that  we  are  given  a  family  of  constraints  {Ci}\=1^  where  each 
set  Ci  is  a  nonempty  closed  convex  subset  of  RJ  such  that 

L 

C=f^Ci  (3) 

l=i 


is  a  nonempty  subset  of  17  and  that  it  is  the  feasible  set  C  of  (1).  Under  these  as¬ 
sumptions,  we  illustrate  the  SM  by  the  superiorization  of  feasibility-seeking  projection 
methods ,  see,  e.g.,  [1,2,3,9,15]  and  the  recent  monograph  [8].  Such  methods  use  pro¬ 
jections  onto  the  individual  sets  Ct  in  order  to  generate  a  sequence  \xk  >  that 

l  J  k= 0 

converges  to  a  point  x*  £  C.  Therefore,  contrary  to  the  PSM  method,  one  does  not 
need  to  assume  that  C  is  a  “simple  to  project  onto”  set,  but  rather  that  the  individual 
sets  Ci  have  this  property.  The  latter  is  indeed  often  the  case,  such  as,  for  example, 
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when  the  sets  C/>  are  hyperplanes  or  half-spaces  onto  which  we  can  project  easily,  but 
their  intersection  is  not  “simple  to  project  onto”. 

The  SM  is  accurately  presented  below  in  Section  4.  But  the  discussion  above  is  suf¬ 
ficient  to  explain  why  we  chose  the  PSM  method  and  the  SM  for  our  comparative  study. 
Namely,  both  methods  interlace  objective-function-reduction  steps  with  steps  oriented 
toward  feasibility.  But  exactly  here  lies  a  big  difference  between  the  two  approaches. 
The  PSM  method  requires  that  feasibility  is  regained  after  subgradient  nonascent  steps 
by  performing  a  projection  onto  C,  whereas  in  the  SM  the  feasibility-seeking  projection 
method  proceeds  bv  projecting  (in  a  well-defined  algorithmically-structured  regime  dic¬ 
tated  by  the  specific  projection  method)  onto  the  individual  sets  Cy  and  not  onto  the 
whole  feasible  set  C.  This  has  a  potentially  great  computational  advantage. 

In  this  paper  we  demonstrate  the  approaches  of  SM  and  PSM  on  a  realistically- 
large-size  problem  with  data  that  arise  from  the  significant  problem  of  x-ray  computed 
tomography  (CT)  with  total  variation  (TV)  minimization.  In  Section  2  we  discuss 
some  related  work,  in  Section  3  we  describe  the  PSM  method  that  we  use,  in  Section 
4  the  SM  is  presented,  and  in  Section  5  the  computational  demonstration  is  reported, 
followed  by  some  conclusions  in  Section  6. 


2  Related  previous  work 

In  this  section  we  briefly  describe  the  relationships  of  superiorization  with  previously 
published  work  on  related  methods.  The  points  made  in  the  second  half  of  Subsection 
2.2  are  of  particular  importance. 


2.1  Previous  work  on  superiorization 

The  superiorization  methodology  was  first  proposed  (although  without  using  the  term 
superiorization)  in  [7].  In  that  work  perturbation  resilience  (without  using  this  term) 
was  proved  for  the  general  class  of  string-averaging  projection  (SAP)  methods,  see 
[11,12,13,14,36],  that  use  orthogonal  projections  and  relate  to  consistent  constraints. 
Subsequent  investigations  and  developments  were  done  in  [10,17,21,31,37].  In  [10]  the 
methodology  was  formulated  over  general  problem,  structures  which  enabled  rigorous 
analysis  and  revealed  that  the  approach  is  not  limited  to  feasibility  and  optimization. 
In  [17]  perturbation  resilience  was  analyzed  for  the  class  of  block-iterative  projection 
(BIP)  methods,  see  [1,2,3,9,15],  and  applied  in  this  manner.  In  [21]  the  advantages 
of  superiorization  for  image  reconstruction  from  a  small  number  of  projections  was 
studied,  and  in  [31]  two  acceleration  schemes  based  on  (symmetric  and  nonsymmetric) 
BIP  methods  were  proposed  and  experimented  with.  In  [37]  total  variation  superi¬ 
orization  schemes  in  proton  computed  tomography  (pCT)  image  reconstruction  were 
investigated. 

In  [22]  we  introduced  the  notion  of  e-compatibility  into  the  superiorization  ap¬ 
proach  in  order  to  handle  inconsistent  constraints.  This  enabled  us  to  close  the  logical 
discrepancy  between  the  assumption  of  consistency  of  constraints  and  the  actual  ex¬ 
perimental  work  done  previously.  We  also  introduced  there  the  new  notion  of  strong 
perturbation  resilience  which  generalizes  the  previously  used  notion  of  perturbation 
resilience.  Algorithmically,  the  new  superiorized  algorithm  introduced  there  (and  used 
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here)  is  different  from  all  previous  ones  in  that  it  uses  the  notion  of  nonascending  di¬ 
rection  and  in  that  it  allows  several  perturbation  steps  for  each  feasibility-seeking  step, 
an  aspect  that  has  practical  advantages. 

In  [23]  superiorization  was  applied  to  the  expectation  maximization  (EM)  algorithm 
instead  of  the  feasibility-seeking  projection  methods  that  were  used  in  superiorization 
previously.  The  approach  was  implemented  there  to  solve  an  inverse  problem  of  bio¬ 
luminescence  tomography  (BLT)  image  reconstruction.  Such  EM  superiorization  was 
investigated  further  and  applied  to  a  problem  of  Single  Photon  Emission  Computed 
Tomography  (SPECT)  in  [25].  Most  recently,  the  SM  was  further  investigated  numeri¬ 
cally,  along  with  many  projection  methods  for  the  feasibility  problem  and  for  the  best 
approximation  problem,  in  [4]. 


2.2  The  works  of  of  Nurminski,  of  Helou  Neto  and  De  Pierro,  and  of  Nedic 

Our  superiorization  methodology  should  be  distinguished  from  the  works  of  Helou  Neto 
and  De  Pierro  [30,29],  of  Nedic  [40,26],  and  of  Nurminski  [33,32,34,35].  The  lack  of 
cross-referencing  between  some  of  these  papers  shows  that,  in  spite  of  the  similarities 
between  their  approaches,  their  results  were  apparently  reached  independently. 

Nurminski:  The  algorithms  of  Nurminski  use  Fejer  operators,  that  can  be  used  in 
feasibility-seeking,  and  introduces  into  them  disturbances  with  diminishing  step-sizes 
A*,  — +  0  as  k  —>  oo,  where  the  rate  of  this  tendency  is  such  that  ^/*L0  A*,  =  +oo. 
Under  these  conditions  and  a  variety  of  additional  assumptions,  Nurminski  showed 
asymptotic  convergence  of  the  iterates  generated  by  his  algorithms  to  a  minimum 
point  of  the  constrained  minimization  problem. 

Helou  Neto  and  De  Pierro:  The  framework  proposed  by  Helou  Neto  and  De 
Pierro  uses  interlacing  of  ‘feasibility  operators”  with  “optimality  operators”  with  the 
aim  of  creating  exact  constrained  minimization  algorithms.  Similarly  to  Nurminski, 
they  employ  diminishing  step-sizes  A*.  — >  0  as  k  — >  oo  such  that  =  +oo- 

Under  these  conditions  and  a  variety  of  additional  assumptions,  different  than  those 
of  Nurminski,  they  show  asymptotic  convergence  of  the  iterates  generated  by  their 
algorithmic  framework  to  a  minimum  point  of  the  constrained  minimization  problem. 

However,  when  it  comes  to  derivation  of  specific  algorithms  from  the  general  frame¬ 
work  of  [29,  Equation  (3)],  their  feasibility  operator  T  invariably  takes  the  form 

Tf(x)  =  x  —  /i(s:)VF(i),  (4) 

where  the  function  F(x),  whose  gradient  is  calculated,  is  “a  convex  function  such  that 
the  set  of  minima  of  this  function  coincides  with  the  set  X  [in  [29,  Equation  (3)]  X  is  the 
feasible  set  of  the  minimization  problem  and  should  be  identified  with  C  =  nil  Q  in 
our  notation]  when  it  is  not  empty  and  defines  a  solution  in  an  appropriate  way  (least 
squares  for  example)  otherwise”  and  p(x)  are  some  parameters  that  are  restricted  in 
a  particular  manner  as  in  [29,  Lemma  7  or  Corollary  8].  This  function  F(x)  may  be 
identified  with  what  we  call  the  proximity  function  Proxc{ x)  in  the  superiorization 
methodology,  but  our  feasibility-seeking  algorithms,  and  hence  their  versions  produced 
by  our  superiorization  methodology,  are  not  limited  to  the  form  of  (4). 

Nedic:  The  overall  approach  of  Nedic  is  to  apply  gradient  and  subgradient  itera¬ 
tive  methods  for  the  objective  function  minimization  and  interlace  into  them  random 
feasibility  updates.  Her  resulting  “random  projection  method”  [26,  Equation  (4)]  bears 
structural  similarity  to  our  SM  but  again  the  diminishing  step-sizes  — >  0  as  k  — ►  oo 
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in  [26,  Equation  (4)]  must  be  such  that  &k  =  +oo,  see  [26,  Proposirion  1  and 

Proposition  2].  The  randomness  refers  to  the  way  the  constraints  are  picked  up  for  the 
feasibility  updates. 

To  summarize,  there  are  various  differences  among  all  the  above  works  and  between 
them  and  our  work,  differences  in  overall  setup  of  the  problems,  differences  in  the 
assumptions  used  for  the  various  convergence  results,  etc.  This  is  not  the  place  for  a 
full  review  of  all  these  differences.  But  we  wish  to  clarify  the  fundamental  difference 
between  them  and  the  SM.  The  point  is  that  when  two  activities  are  interlaced  (here, 
feasibility  steps  and  objective  function  reduction  steps)  then  once  the  process  is  running 
all  such  methods  look  similar.  Prom  looking  at  the  iterative  formulas  one  cannot  tell 
if  (a)  “feasibility  steps  are  interlaced  into  an  iterative  gradient  scheme  for  objective 
function  minimization”  or  if  (b)  “objective  function  reduction  steps  are  interlaced  into 
an  iterative  projections  scheme  for  feasibility-seeking”. 

The  common  thread  of  all  above  mentioned  works  is  that  they  fall  into  the  category 
(a),  while  the  SM  is  of  the  kind  (b).  In  all  methods  of  category  (a)  the  condition  that  is 
needed  to  guarantee  convergence  to  a  constrained  minimum  point  is  that  the  diminish¬ 
ing  step-sizes  Qj,  — »  0  as  k  — >  oo  must  be  such  that  ak  =  +00-  In  contrast,  since 

the  feasibility-seeking  projection  method  is  the  “leader”  of  the  overall  process  in  the 
SM,  we  must  have  that  the  perturbations  (that  do  the  objective  function  reduction)  will 
use  diminishing  step-sizes  (3^  — >  0  as  k  — >  oo  but  such  that  Ac  <  oo.  The  latter 

condition  guarantees  the  perturbation  resilience  of  the  original  feasibility-seeking  pro¬ 
jection  method  so  that,  regardless  of  the  interlaced  objective  function  reduction  steps, 
the  overall  process  converges  to  a  feasible,  or  e-compatible,  point  of  the  constraints. 

Yet  another  fundamental  difference  between  the  superiorization  methodology  and 
the  algorithms  of  category  (a)  mentioned  above  is  that  those  algorithms  perform  the 
interlaced  objective  function  descent  and  feasibility  steps  alternatingiy  according  to  a 
rigid  predetermined  scheme,  whereas  in  the  superiorization  methodology  the  activation 
of  these  steps  and  the  decisions  whether  to  keep  an  iterate  or  discard  it  are  done 
inside  the  superiorized  algorithm  in  a  controlled  and  automatically-supervised  manner. 
Thus,  the  superiorization  methodology  has  the  following  features  not  present  in  the 
algorithms  of  category  (a)  mentioned  above:  (i)  it  conducts  iterations  of  a  feasibility¬ 
seeking  projection  method  which  is  strongly  perturbation  resilient  (as  defined  below), 
(ii)  it  interlaces  objective  function  nonascent  steps  into  the  process  in  a  controlled  and 
automatically-supervised  manner,  (iii)  it  is  not  known  to  guarantee  convergence  to  a 
solution  of  the  constrained  minimization  problem  and  it  might  (we  do  not  know  if 
this  is  so  or  not)  instead,  only  be  shown  to  lead  to  a  feasible  point  whose  objective 
function  value  is  less  than  that  of  a  feasible  point  that  would  have  been  reached  by  the 
same  feasibility-seeking  projection  method  without  the  perturbations  exercised  by  the 
superiorized  algorithm. 


2.3  Adaptive  steepest  descent  projections  onto  convex  sets 

The  adaptive  steepest  descent  projections  onto  convex  sets  (ASD-POCS)  algorithm  is 
described  in  detail  in  [44]  and  its  use  has  since  been  reported  in  a  number  of  subsequent 
publications  (for  example,  in  [43])  in  the  CT  literature.  It  interlaces  function  descent 
steps  with  projections  onto  convex  sets.  However,  it  is  not  as  general  as  the  SM;  see 
[22]  for  a  comparison. 
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3  The  projected  subgradient  method 


In  this  section  we  describe  the  projected  subgradient  method  (PSM)  and  how  we  apply 
it.  We  follow  the  presentation  given  in  [27,  Subsection  3.2.3],  using,  in  particular, 
Theorem  3.2.2  therein.  This  convergence  theorem  requires  convexity  and  local  Lipschitz 
continuity  of  the  objective  function  (f>  and  that  the  feasible  set  C  in  (1)  is  closed  and 
convex.  The  latter  is  indeed  the  case  here  since  we  assume  that  the  feasible  set  C  is 
given  by  (3),  with  all  the  conditions  stated  there.  The  PSM  generates  a  sequence  of 
iterates 


{ xk  1  according  to  the  recursion  formula 

1  J  k= o 


fc+l 


Pc  (xk  -  tk(j>'  , 


(5) 


where  tk  >  0  is  a  step-size,  <j> '  (xk^  £  d<t>  *s  a  subgradient  of  <j>  at  xk ,  and  Pq 

stands  for  the  orthogonal  projection  onto  the  set  C.  The  PSM  has  been  investigated 
under  several  types  of  step-size  rules.  Some  commonly  used  such  rules  can  be  found, 
e.g.,  in  [6].  We  adopt  a  nonsummable  diminishing  step-length  rule  of  the  form  tk  = 
7k/ \<t>'  (®fc)  ||>  where  -yk  >  0,  lim^^  7fe  =  0,  EfcLo  7fc  =  oo.  All  the  above  is 
precisely  identical  with  the  method  in  [27,  Equation  (3.2.8)]  and  under  these  conditions 
[27,  Theorem  3.2.2]  guarantees  convergence  of  ^ xk ^  j-  ^_0  to  a  minimum  value  of  (j> 
over  C. 

The  next  issue  that  we  discuss  is  how  to  compute  the  projection  Pq  which 
is  required  in  the  PSM  (5).  To  do  this,  one  has  to  solve  the  optimization  problem  of 
finding 


arg  mm 


1  II  ||  2 

2  X  ~  ^ 


€C= 

1=1  J 


(6) 


for  q  =  xk  —  tk(j>'  that  has  been  calculated.  We  adopt  the  well-trodden  path 

of  solving  (6)  via  duality  theory.  We  introduce  new  vector  variables  j/  £  RJ ,  for  all 
£  =  1,2 and  rewrite  the  problem  (6)  as  the  equivalent  one  of  finding 


arg  mm 
x^y1  ,y2,...,yJ 

such  that 

and 


+b_\\x-qf 

yC  =  x,  £=1,2,...,L 
ye£Ce,  i  =  1,2, ...  ,L. 


(7) 


By  assigning  dual  vectors  £  RJ ,  l  =  1.2...../.,  to  every  equality  constraint  in  (7) 
we  obtain  the  dual  function 


D  I 


^A1,  A2, . . . ,  A L)  :=  J2e=i  min  ji  “  l\\  +  (AW)  I  y(  e 

+  min  1 1  ||x  —  q\\2  —  (j2(=i  I  x  e  ^?'7}  • 

Some  rearrangements  of  terms  then  lead  to 


(8) 


D  (V,  A2,. . . ,  A =  Efcimin|^  ||/  -  (q~  a<?)| 


-h  1  -A7 


y  e  Ct 


+  min  1 1  II*-  g||2  -  ^E£=i  X^x)  I  x  G  -R'7}  ■ 
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By  solving  the  minimization  problems  on  the  right-hand  side  of  the  above  equation  we 
obtain 

D  (A1,  A2,  ...,\L)  =  Eir  |  ( q  ~  \l)  -  PCt  («  -  A*)  f  -  I  \\q  ~  >?( ) 

-  slEiiAf  -(EiiV,«). 

It  can  be  shown  that  this  function  is  concave  and  has  a  Lipschitz  continuous  gradient. 
When  the  projections  onto  each  C ^  are  available,  then  the  values  and  gradients  of  this 
dual  function  D  are  also  available.  Thus,  the  dual  problem  is  the  convex  unconstrained 
optimization  problem  of  finding 

arg  max  {d  ( A1,  A2, . . . ,  XL)  \  \e  €  RJ ,  £=1,2  (11) 

A1, A2,..., A1,  f  V  /  J 

where  the  objective  function  D  ^A1,  A2, . . . ,  is  concave  and  smooth,  and  its  values 

and  gradients  are  available  at  any  point  ^A1,  A2, . . . ,  XL>j .  The  optimal  point  x*  of  the 
primal  problem  (7)  is  then  given  by 

x*=q  +  YJ  A*^,  (12) 

e=i 

where  ^A*1,  A*2, . . . ,  A*"^  is  an  optimal  point  of  the  dual  problem  (11). 

A  possible  drawback  of  this  approach  via  duality  theory  is  that  by  introducing  new 
variables  in  (7)  we  increase  the  number  of  variables  in  the  dual  problem.  It  turns  out 
that  in  some  special  cases  of  the  problem  (6)  we  can  efficiently  solve  the  corresponding 
dual  problem  without  enlarging  its  size.  One  of  these  special  cases  occurs  when  in 
problem  (6)  /.  —  /  i  1 .  the  sets  Cj  :=  €  RJ  \  =  bij-,  for  i  =  1,2, ...  ,1,  are 

hyperplanes  where  the  vector  a 1  is,  for  i  =  1,2 the  *th  row  of  the  matrix  A , 
b  =  (&i,  &2j  ■  ■  ■  j  bj),  and  C/_|_i  :=  €  RJ  0  <  x  <  1  j  is  an  additional  convex  set  of 
box  constraints;  that  is,  the  problem  (6)  takes  the  form  of  finding 

(13) 


arg  mm 


1  II  ||  2 

2  I'*  _  ^11 


Ax  =  b  and  0  <  x  <  1 
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The  problem  (14)  can  be  solved  by  the  optimal  method  of  Nesterov  [28].  We  give  its 
description  for  the  problem  of  unconstrained  minimization  of  a  function  /(*),  assuming 
that  f(x)  is  convex  and  continuously  differentiable  with  Lipschitz  continuous  gradients; 
i.e. ,  that  there  exists  a  constant  L  >  0  such  that,  for  all  x,  y  £  RJ , 

l|V/(a:)-V/(y)||  <L\\x-y\\.  (17) 


Nesterov’s  Algorithm 

(Nl)  Initialization:  Select  a  point  y°  £  RJ  and  put  k  =  0,  /3o  =  1,  a:-1  = 
y°,  a_i  =  j| y°  —  z||  /  ||  V/  (y°)  —  V/(z)||,  where  z  is  any  point  in  RJ  such  that 
and  V/(z)  ^  V/(y°). 

(N2)  Iterative  Step:  Given  xk~1 ,  yk ,  Ofc_i  and  (3 ^ 

(N2.1)  Calculate  the  smallest  index  s  >  0  for  which  the  following  inequality  holds 

/  (yk)  -  f(yk  -  2-safe_i V/  (/))  >  2_s_1afc_1  ||v/  (/')  ||2  .  (18) 

(N2.2)  Calculate  the  next  iterate  by 

ak  =  2~sak-i  and  xk  =  yk  -  akV f  (yk^j  ,  (19) 

and  update 

Pk+l  =  +  2  \j >  (2°) 

and 

fc+1  k  -  flk  1  /  k  k—l\  / <-) -i  \ 

y  =  x  +-pj-tT(x  -x  )■  (21) 

When  a  stopping  rule  applies,  then  the  point  xk  is  the  output  of  the  method. 


4  The  superiorization  methodology 

In  this  section  we  present  a  restricted  version  of  the  SM  of  [22]  adapted  to  our  problem 
(1).  As  discussed  in  Section  1,  we  associate  with  the  feasible  set  C  in  (1)  a  proximity 
function  Proxc  ■  f?  — >  R+  that  is  an  indicator  of  how  incompatible  an  x  £  1?  is 
with  the  constraints.  For  any  given  e  >  0,  a  point  x  £  for  which  Proxc (x)  <  £  is 
called  an  e-compatible  solution  for  C.  We  further  assume  that  we  have,  for  the  C  in 
(1),  a  feasibility-seeking  algorithmic  operator  Ac  '■  RJ  — >  12,  with  which  we  define  the 
following  basic  algorithm. 

The  Basic  Algorithm 

(Bl)  Initialization:  Choose  an  arbitrary  x°  £  1?, 

(B2)  Iterative  Step:  Given  the  current  iterate  xk,  calculate  the  next  iterate  xk+1  by 

xk+1  =  Ac  ( xk )  .  (22) 


The  following  definition  helps  to  evaluate  the  output  of  the  Basic  Algorithm  upon 
termination  by  a  stopping  rule. 
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Definition  1  The  e-output  of  a  sequence 

T  f  I  'i  OO 

Given  C  C  RJ ,  a  proximity  function  Proxc  ■  fi  — +  f?+ ,  a  sequence  {  xK  >  C  PI 

V  J  k— o 

and  an  e  >  0,  then  an  element  xK  of  the  sequence  which  has  the  properties:  (i) 
Proxc  (a:^)  —  e’  and  (**)  Proxc  >  £  f°r  all  0  <  k  <  K,  is  called  an  e-output 

f  i.  i  00 

of  the  sequence  <  aY  }•  with  respect  to  the  pair  (C,  Proxc)-  We  denote  it 
by  0(c,e,{xk}™J=xK- 


Clearly,  an  e-output  O  (c,  e,  |  j  of  a  seciuence  might  or  might 

{i t  *|  OO 

xK  >  is  produced  by  an  algorithm 
J  k= o 

intended  for  the  feasible  set  C,  such  as  the  Basic  Algorithm,  without  a  termination 
criterion,  then  O  (c,e,  )  is  the  output  produced  by  that  algorithm  when  it 

includes  the  termination  rule  to  stop  when  an  e-compatible  solution  for  C  is  reached. 


Definition  2  Strong  perturbation  resilience 

Assume  that  we  are  given  a  C  C  17,  a  proximity  function  Proxc j  an  algorithmic 

operator  Ac  and  an  x°  £  fi.  We  use  \xk\  to  denote  the  sequence  generated  by 

l  J  k= 0 

the  Basic  Algorithm  when  it  is  initialized  by  a:.  The  Basic  Algorithm  is  said  to  be 
strongly  perturbation  resilient  if  the  following  hold: 

(i)  there  exist  an  e  >  0  such  that  the  e-output  O  (c,£,  |:rfc|  )  is  defined  for 
every  x°  £  fi; 

(ii)  for  every  e  >  0,  for  which  the  e-output  O  [c,£,  j  is  defined  for  every 

x°  £  1?,  we  have  also  that  the  ^-output  O  (c,  £* ,  j  'j  is  defined  for  every  e7  >  e 

f  k 1  00  ° 

and  for  every  seciuence  <y  >  generated  by 
1  J  k= o 

yk+1  =  Ac  ( yk  +  /3kvk^j  ,  for  all  k  >  0,  (23) 

{,  'j  OO 

vk  >  is  bounded  and  the  scalars  {/3fc}^ 0  are  such  that 
Pk  >  0,  for  all  k  >  0,  and  fik  <  oo. 


Definition  3  Bounded  convergence 

Assume  that  we  are  given  a  C  C  RJ ,  a  proximity  function  Proxc  and  an  algorith¬ 
mic  operator  Ac  :  RJ  — *  Pi-  Then  the  Basic  Algorithm  is  said  to  be  convergent  over 
fi  if  for  every  x°  £  fi  there  exist  the  limit  lim^^  xk  =  y  ( x° )  and  y  (a:0)  £  fi.  It 
is  said  to  be  boundedly  convergent  over  fi  if,  in  addition,  there  exists  a  7  >  0  such 
that  Proxc  {v  ix°))  —  7  f°r  every  x°  £  fi. 


The  next  theorem,  which  gives  sufficient  conditions  for  strong  perturbation  re¬ 
silience  of  the  Basic  Algorithm,  has  been  proved  in  [22,  Theorem  1]  (in  different  word¬ 
ing). 

Theorem  1  Assume  that  we  are  given  a  C  C  R'\  a  proximity  function  Proxc  and  an 
algorithmic  operator  Ac  '■  — >  fi-  If  Ac  is  nonexpansive  and  is  such  that  it  defines  a 

boundedly  convergent  Basic  Algorithm  and  if  the  proximity  function  Proxc  uniformly 
continuous,  then  the  Basic  Algorithm  defined  by  Ac  is  strongly  perturbation  resilient. 
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Along  with  the  C  C  RJ ,  we  look  at  the  objective  function  cj>  :  R'J  — +  R,  with  the 
convention  that  a  point  in  RJ  for  which  the  value  of  tf>  is  smaller  is  considered  superior 
to  a  point  in  RJ  for  which  the  value  of  (j>  is  larger.  The  essential  idea  of  the  Superior¬ 
ization  Methodology  (SM)  is  to  make  use  of  the  perturbations  of  (23)  to  transform  a 
strongly  perturbation  resilient  algorithm  that  seeks  a  constraints-compatible  solution 
for  C  into  one  whose  outputs  are  equally  good  from  the  point  of  view  of  eonstraints- 
compatibility,  but  are  superior  (not  necessarily  optimal)  according  to  the  objective 
function  <j>. 

This  is  done  by  producing  from  the  Basic  Algorithm  another  algorithm,  called  its 
superiorized  version,  that  makes  sure  not  only  that  the  PkVk  are  bounded  perturba¬ 
tions,  but  also  that  tj>  ^ yk  +  (3yvk ^  <  <j>  (l^)’  f°r  ^  —  0-  To  do  so,  we  use  the  next 
concept,  closely  related  to  the  concept  of  “descent  direction”. 

Definition  4  Given  a  function  (j>  :  R'1  — >  R  and  a  point  y  £  RJ,  we  say  that  a  vector 
d  £  RJ  is  nonascending  for  (j>  at  y  if  ||d||  <  1  and  there  is  a  5  >  0  such  that 

for  all  A  £  [0,  5]  we  have  <j>(y  +  A d)  <  <fi  ( y ) .  (24) 

Obviously,  the  zero  vector  is  always  such  a  vector,  but  for  superiorization  to  work 
we  need  a  sharp  inequality  to  occur  in  (24)  frequently  enough. 

The  Superiorized  Version  of  the  Basic  Algorithm  assumes  that  we  have  available 

a  summable  sequence  {r/e }^.0  of  positive  real  numbers  (for  example,  r/e  =  a where 

r  ,  i  oo 

0  <  a  <  1)  and  it  generates,  simultaneously  with  the  sequence  It/  i  in  (7,  sequences 

{,  -i  oo  0 

vk  >  and  {/3fe }^_q .  The  latter  is  generated  as  a  subsequence  of  resulting 

J  k= o 

in  a  nonnegative  summable  sequence  {Pk}'kLo-  The  algorithm  further  depends  on  a 
specified  initial  point  y°  £  17  and  on  a  positive  integer  N.  It  makes  use  of  a  logical 
variable  called  loop.  The  superiorized  algorithm  is  presented  next  by  its  pseudo-code. 

Superiorized  Version  of  the  Basic  Algorithm 

1.  set  k  =  0 

2.  set  yk  =  y° 

3.  set  £  =  —  1 

4.  repeat 

5.  set  n  =  0 

6.  set  yk’n  =  yk 

7.  while  n<N 

8.  set  vk’n  to  be  a  nonascending  vector  for  <j>  at  yk,n 

9.  set  loop=true 

10.  while  loop 

11.  set  £  =  £  +  1 

12.  set  /3fei„  =  ye 

13.  set  z  =  yk,n  +  Pk,nVk,n 

14.  if  <j)  (z)<4>  (yk 'j  then 

15.  set  n=n  +  1 

16.  set  yk,n=z 

17.  set  loop  =  false 

18.  set  yk+1=Ac  (V^) 
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19.  set  k  =  k  +  1 


According  to  the  analysis  of  the  behavior  of  this  algorithm  in  [22],  the  algo- 

f  k  1  00 

rithm  produces  a  sequence  \  y  \  for  which  (23)  is  satisfied.  Further,  the  follow¬ 
ing  important  fact  is  shown  to  be  true  in  [22]:  If,  for  a  given  e  >  0,  the  e-output 
O  (c,£,  |  )  °f  the  Basic  Algorithm  is  defined  for  every  x°  £  J ?,  then  every  se¬ 

quence  j  generated  by  the  Superiorized  Version  of  the  Basic  Algorithm  has  an 


fc=o 


e^output  O  ^C,s',  j  )  for  every  s'  >  s.  In  other  words,  the  Superiorized  Version 


produces  outputs  that  are  essentially  as  constraints  cmnpat ible  as  those  produced  by 
the  original  (not  superiorized)  algorithm.  However,  due  to  the  repeated  steering  of  the 
process  by  lines  7  to  17  toward  reducing  the  value  of  the  objective  function  <j>,  we  can 
expect  that  the  output  of  the  Superiorized  Version  will  be  superior  (front  the  point  of 
view  of  (j>)  to  the  output  of  the  original  algorithm. 


5  A  computational  demonstration 

5.1  The  x-ray  CT  problem 

The  fullv-discretized  model  in  the  series  expansion  approach  to  the  image  reconstruc¬ 
tion  problem  of  x-ray  computerized  tomography  (CT)  is  formulated  in  the  following 
manner.  A  Cartesian  grid  of  square  picture  elements,  called  pixels ,  is  introduced  into 
the  region  of  interest  so  that  it  covers  the  whole  picture  that  has  to  be  reconstructed. 
The  pixels  are  numbered  in  some  agreed  manner,  say  from  1  (top  left  corner  pixel)  to 
J  (bottom  right  corner  pixel). 

The  x-rav  attenuation  function  is  assumed  to  take  a  constant  value  Xj  throughout 
the  jth  pixel,  for  j  =  1,  2, ...,  J.  Sources  and  detectors  are  assumed  to  be  points  and  the 
rays  between  them  are  assumed  to  be  lines.  Further,  assume  that  the  length  of  intersec¬ 
tion  of  the  *th  ray  with  the  jth  pixel,  denoted  by  a*-,  for  *=1,2, ...,  /,  j  =  1,  2, ...,  J, 
represents  the  weight  of  the  contribution  of  the  jth  pixel  to  the  total  attenuation  along 
the  ith  ray. 

The  physical  measurement  of  the  total  attenuation  along  the  ith  rav,  denoted  by 
bj ,  represents  the  line  integral  of  the  unknown  attenuation  function  along  the  path  of 
the  ray.  Therefore,  in  this  fully-discretized  model,  the  line  integral  turns  out  to  be  a 
finite  sum  and  the  model  is  described  by  a  system  of  linear  equations 

J 

5 ~Zx3a)  =  bi ,  *  =  1,2,...,/.  (25) 

i= i 

In  matrix  notation  we  rewrite  (25)  as 


Ax  =  b,  (26) 

where  b  £  R 1  is  the  measurement  vector ,  x  £  R'J  is  the  image  vector,  and  the  I  X  J 
matrix  A  =  is  the  projection  matrix.  See  [20],  especially  Section  6.3,  for  a  complete 
treatment  of  this  subject. 
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5.2  The  algorithms  that  we  use 

In  this  section  we  describe  the  PSM  and  SM  algorithms  specifically  used  in  our  demon¬ 
stration.  We  applied  both  algorithms  to  solve  the  fully-discretized  model  in  the  series 
expansion  approach  to  the  image  reconstruction  problem  of  x-ray  CT,  formulated  in 
the  previous  subsection  and  represented  by  the  optimization  problem 

minimize  {</>(*)  |  Ax  =  b  and  0  <  x  <  1}  .  (27) 

The  box  constraints  are  natural  for  this  problem:  If  x j  represents  the  linear  atten¬ 
uation  coefficient,  measured  in  cm"1,  at  a  medically-used  x-ray  energy  spectrum  in  the 
jth.  pixel,  then  the  box  constraints  0  <  x  <  1  are  reasonable  for  tissues  in  the  human 
body;  see  Table  4.1  of  [20].  Hence,  for  the  image  reconstruction  problem  of  x-ray  CT, 
we  define  12  by 

12  =  {*  €  RJ  |  0  <  *  <  l}  .  (28) 

We  note  that  this  12  is  bounded. 

The  choice  of  C  in  (1)  is  of  the  type  specified  in  (3),  with  L  =  I  +  1,  Cj  = 
I*  £  RJ  |  (^al , x'j  =  bi  j-,  for  i  =  1,2 ,...,/  and  Ci+\  =  12.  Note  that  these  are  exactly 
the  conditions  that  were  used  to  derive  in  Section  3  the  special  case  of  the  PSM 
expressed  in  (13)  and  the  equations  that  follow  it.  Furthermore,  since  in  the  experiment 
reported  below,  we  start  with  a  specific  image  vector  x  £  J?  and  calculate  from  it  the 
measurement  vector  b  £  R1  using  (25),  we  know  that  C  is  a  nonempty  subset  of  12, 
which  is  the  requirement  stated  below  (3). 

For  any  such  C,  we  define  Proxc  ■  12  — >  R+  by 


Proxc (x) 


\ 


12  b> 


{a\x)2. 


(29) 


Note  that  this  proximity  function  Proxc  is  uniformly  continuous  and  thus  satisfies  the 
condition  stated  for  it  in  Theorem  1. 

Our  choice  for  the  objective  function  (j>  is  the  total  variation  (TV)  of  the  image 
vector  x.  Denoting  the  G  x  H  image  array  X  ( GH  =  J)  obtained  from  the  image 
vector  x  by  Xg  ^  =  x^g_i')^+fl,  for  1  <  g  <  G  and  1  <  h  <  H,  we  use 


G—l  H- 1 


<t>{x)=TV{X)=  s] (xg+hh  -  X9ih)2  +  (xg,h+1  -  Xg,hy 


3=1  h=  1 


(30) 


5.2.1  Projected  Subgradient  Method  (PSM) 

As  mentioned  at  the  beginning  of  Section  3,  according  to  [27,  Theorem  3.2.2],  we  need 
to  verify  convexity  and  local  Lipschitz  continuity  of  the  objective  function  in  (30). 
Convexity  of  the  TV  objective  function  follows,  for  example,  from  the  end  of  the  Proof 
of  Proposition  1  in  [16].  The  Lipschitz  continuity  is  not  difficult  to  prove  as  the  next 
proposition  shows. 

Proposition  1  The  total  variation  function  (30)  is  Lipschitz  continuous  on  the  whole 
space  R’1 . 


14 


Y.  Censor,  R.  Davidi,  G.T.  Herman,  R.W.  Schulte  and  L.  Tetruashvili 


Proof  Each  summand  in  (30)  can  be  written  as 

s]  (Xg+Uh  -  Xg,h)2  +  (Xg,h+1  ~  Xg)h)2  =  ^Ag^X^  ,  (31) 

where  Agj ,  is  a  square  matrix  having  only  two  nonzero  rows,  with  the  first  nonzero 
row  containing  only  two  nonzero  elements  1  and  —1  that  correspond  to  the  variables 
Xg_ )_i /j  and  Xg  ,  respectively,  and  the  second  nonzero  row  containing  only  two  nonzero 
elements  1  and  —1  that  correspond  to  the  variables  Xg  j,  .  \  Xgk,  respectively.  Thus, 
the  TV  function  can  be  rewritten  as 


G— l H- 1 

TV(X)  =  E  E  IK*AII 

3=1  h=  1 


allowing  us  to  estimate  the  Lipscliitz  constant  as  follows: 

\TV(X)  —  TV(Y)\  = 


EE  IK^EI2-IK^1I 

3=1  h=  1 


G-1H-1  G-1H-1 

<EE  \\\Aa,hX\\2-\\AgthY\\2\  <  E  E  IK**  ~A9aY\\ 

3=1  h=  1  3=1  h=  1 

G-1H-1  G—l H—l 

=  EE  \\Ag,h(X-Y)\\2<  J2  E  K/.llall^-^lla, 

g= 1  h= 1  g= 1  h=  1 


(32) 


where  ||j4||,  =  \J Anrax  (ATA)  with  Anrax  denoting  the  maximal  eigenvalue.  Thus  we 
have  shown 

| TV(X)  -  TV(Y) |  <  L  \\X  -  Y\\2  ,  (33) 

with  L  =  EEi1  'Eh= i  \\Aa,h ||2- 

We  note  that  the  function  f(x)  =  j|a;  — qfe||2  is  convex,  continuously  differen¬ 
tiable,  and  fulfills  (17)  with  L  =  1.  The  algorithm  that  we  use  to  realize  the  PSM  is 
as  follows. 

The  PSM  Algorithm 

(PI)  Initialization:  Select  a  point  x°  £  RJ ,  select  integers  K  and  M,  use  two  real 
number  variables  curr  and  prev ,  and  set  curr  =  <fi  (ar)  and  prev  =  curr. 

(P2)  Iterative  step:  Given  the  current  iterate  xk ,  calculate  the  next  iterate  xk+1  as 
follows: 

(P2.1)  Calculate  a  subgradient  of  (j>  at  xk ,  i.e. ,  <f>  (xk^  £  dr/)(xk^,  a  step-size  t k  = 
k~ i/4/  (xk^J  |  2  and  the  vector 

qk  =  xk  -  tk<f>  (xk^j  .  (34) 

(P2.2)  Apply  Nesterov’s  Algorithm  to  solve  the  problem 

minimize  ^\\x- gfe||  2  |  Ax  =  b  and  0  <  x  <  l|  .  (35) 
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(P2.3)  Update  to  find  the  next  iterate 


fc+1 

x  =  arg  mm 

X 


1 

2 


2 


Ax  =  b  and  0  <  x  <  1  >  . 


(36) 


(P2.4)  If  <j>  <  curr  then  curr  =  <j>  ^x^-1-1^. 

(P3)  Stopping  rule:  If  km.odK  =  0  (i.e. ,  k  is  divisible  by  A'),  then: 

If  prev  —  curr  <  prev  /  M  then  stop.  Otherwise,  prev  =  curr  and  go  to  (2). 

In  step  (P2.2)  above,  problem  (35)  is  solved  as  follows.  From  a  given  problem  (35) 
pass  to  its  dual 

maximize  {/(A)  |  AGE7},  (37) 

where 

/(A)  =  I  Lfc  -  ata  -  pCi+1  (qk  -  AT a)  II2  -  i  Lfc  -  atx\\2 

1  II  i.i|2  (38) 

—  (A,  b)  +  2  If?  I 

and  PcI+ 1  denotes  the  orthogonal  projection  onto  the  set  |x  G  RJ  \  0  <  x  <  l|.  De¬ 
note  by  \*k  the  optimal  solution  of  problem  (37).  Then  the  optimal  point  x*k  of  the 
problem  (35)  is  given  by 

x*k  =  PCl+1  ( qk  -  AT A*fc)  .  (39) 

We  apply  Nesterov’s  Algorithm  for  finding  an  optimal  solution  X*k  of  the  problem  (37). 

In  the  reported  experiments  we  used  the  starting  points  x°  and  yQ  in  the  PSM 
Algorithm  and  in  Nesterov’s  Algorithm,  respectively,  to  be  zero  vectors.  In  the  ini¬ 
tialization  step  of  the  PSM  Algorithm  we  selected  K  =  10  and  M  =  5000.  In  the 
initialization  step  of  Nesterov’s  Algorithm  we  chose  a_i  =  10. 


5.2.2  Superiorization  Method  (SM) 

Our  selected  choice  for  the  operator  in  the  Basic  Algorithm  as  well  as  in  the 
Superiorized  Version  of  the  Basic  Algorithm,  as  described  in  Section  4,  is  based  on 
an  algebraic  reconstruction  technique  (ART),  see  [20,  Chapter  11].  Specifically,  for 
i  —  1.2 . /.  we  define  the  operators  Ui  :  RJ  — >  RJ  by 

bi-la\x )  . 

17i(x)  =  xH - — — 2 — —a.  (40) 

||  a*  || 

Defining  Q  :  RJ  — >  fi  by 

{Xj,  if  0  <  Xj  <  1, 

0,  if  xj  <  0,  (41) 

1,  if  1  <  Xj, 

for  j  =  1,  2, ...,  J,  we  specify  the  algorithmic  operator  Aq  :  f2  — ►  1?  by 

Ac  (x)  =  QUi  -  -U2U1(x). 


(42) 
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Since  the  individual  C/jS  as  well  as  the  Q  are  clearly  nonexpansive  operators,  the  same 
is  true  for  Ap. 

By  well-known  properties  of  ART  (see,  for  example,  Sections  11.2  and  15.8  of 
[20]),  the  Basic  Algorithm  with  this  algorithmic  operator  is  convergent  over  Q  and, 
in  fact,  for  every  ar  £  1?,  the  limit  y  (ar)  is  in  C.  It  follows  that,  for  every  x°  £  12, 
Proxc  ( y  (a:0))  =  0  and  so  the  Basic  Algorithm  is  boundedly  convergent.  According  to 
Theorem  1,  this  combined  with  the  facts  that  Aq  is  nonexpansive  and  the  proximity 
function  Proxc  is  uniformly  continuous,  implies  that  the  Basic  Algorithm  defined  by 
Aq  is  strongly  perturbation  resilient. 

Recalling  the  discussion  just  below  the  description  of  the  Superiorized  Version  of 
the  Basic  Algorithm,  we  make  further  use  of  the  convergence  of  the  Basic  algorithm 
to  an  element  of  C  as  follows.  Since  for  all  e  >  0,  the  e-output  O  (c,£,  |a:fc|  ^ 
of  the  Basic  Algorithm  is  defined  for  every  x°  £  1?,  we  also  have  that  every  sequence 
y  >  generated  by  the  Superiorized  Version  of  the  Basic  Algorithm  has  an  e' - 

k=0  OO 

output  O  [c,e' ,  )  for  every  e'  >  0.  This  means  that  for  the  specific  type  of  C 


that  is  used  in  our  comparative  study,  the  Superiorized  Version  of  the  Basic  Algorithm 
is  guaranteed  to  produce  an  e -compatible  output  for  any  e'  >  0  and  any  initial  point 
y°  £  17. 

The  specific  choices  made  when  running  the  Superiorized  Version  of  the  Basic  Al¬ 
gorithm  for  our  comparative  study  were  the  following.  We  selected  ryi  =  0.999  ,  y° 
to  be  the  zero  vector  and  N  =  9.  All  these  choices  we  made  are  based  on  auxiliary 
experiments  (not  included  in  this  paper)  that  helped  determine  optimal  parameters 
for  the  data-set  discussed  in  Subsection  5.3.  In  addition,  we  need  to  specify  how  the 
nonascending  vector  was  selected  in  line  8  of  the  Superiorized  Version  of  the  Basic  algo¬ 
rithm.  This  was  done  by  the  method  specified  in  [22],  Section  III. A  following  equation 
(12). 


5.3  The  computational  result 

The  computational  work  reported  here  was  done  on  a  single  machine  using  a  single 
CPU,  an  Intel  i5-3570K  3.4  Gliz  with  16  GB  RAM  using  the  SNARK09  software  pack¬ 
age  [18,24];  the  phantom,  the  data,  the  reconstructions  and  displays  were  all  generated 
within  this  same  framework.  In  particular,  this  implies  that  differences  in  the  reported 
reconstruction  times  are  not  due  to  the  different  algorithms  being  implemented  in 
different  environments. 

Figure  1  shows  the  phantom  used  in  our  study,  which  is  a  485  x  485  digitized  image. 
The  phantom  corresponds  to  a  cross-section  of  a  human  head  (based  on  [20,  Figure 
4.6]).  It  is  represented  by  a  vector  with  235,225  components,  each  standing  for  the 
average  x-ray  attenuation  coefficient  within  a  pixel.  Each  pixel  is  of  size  0.376  x  0.376 
mm2.  The  values  of  the  components  are  in  the  range  of  [0,  0.6241749],  however  the 
display  range  used  here  was  much  smaller  [0.204,  0.21675].  The  mapping  between  the 
two  ranges  is  such  that  any  value  below  0.204  is  shown  as  black  and  any  value  above 
0.21675  is  shown  as  white  with  a  linear  mapping  in-between.  We  used  this  display 
window  for  all  images  presented  here. 

Data  were  collected  by  calculating  line  integrals  through  the  digitized  head  phantom 
in  Figure  1  using  60  sets  of  equally  rotated  (in  3  degrees  increments)  parallel  lines,  with 
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Fig.  1:  The  head  phantom.  The  value  of  its  TV  is  984.  Its  tomographic  data  was  obtained  for 
60  views. 


lines  in  each  set  spaced  at  0.752  mm  from  each  other.  Each  line  integral  gives  rise  to 
a  linear  equation  and  represents  a  livperplane  in  RJ .  The  phantom  itself  lies  in  the 
intersection  of  all  the  liyperplanes  that  are  associated  with  these  lines  and  it  also 
satisfies  the  box  constraints  in  (28)  The  total  number  of  linear  equations  is  18,524, 
making  our  problem  underdetermined  with  235,  225  unknowns  (the  intersection  of  all 
the  hyperplanes  is  an  at  least  in  a  216,  701-dimensional  subspace  of  i?235’225).  In  the 
comparative  study,  we  first  applied  the  PSM  and  then  the  SM  to  these  data  as  follows. 

The  PSM  was  implemented  as  described  in  Subsection  5.2.1.  In  particular,  it  started 
with  x°  =  0  (the  zero  vector  all  of  whose  components  are  0),  for  which  Proxc  (*°)  = 
326.  It  was  stopped  according  to  the  Stopping  Rule  (P3),  with  K  =  10  and  M  =  5000. 
The  iteration  number  at  that  time  was  815  and  the  value  of  the  proximity  function 
was  Proxc  (®815)  =  0.0422,  which  is  very  much  smaller  than  the  value  at  the  initial 
point.  The  length  of  computer  time  required  was  2217  seconds.  The  TV  of  the  output 
was  919,  which  is  less  than  that  of  the  phantom,  indicating  that  PSM  is  performing 
its  task  of  producing  a  constraints-compatible  output  with  a  lo-w  TV.  This  output  is 
shown  in  Figure  2(a). 

We  used  the  Superiorized  Version  of  the  Basic  Algorithm,  as  described  in  Subsection 
5.2.2  to  generate  a  sequence  until  it  reached  O  (c,  0.0422,  ^  and 

considered  that  to  be  the  output  of  SM.  We  know  that  this  output  must  exist  for  our 
problem  and  that  its  constraints-compatibility  will  not  be  greater  than  that  of  the 
output  of  PSM.  The  length  of  computer  time  required  to  obtain  this  output  was  102 
seconds,  which  is  over  twenty  times  shorter  than  what  was  needed  by  PSM  to  get  its 
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Fig.  2:  Reconstructions  of  the  head  phantom  of  Figure  1.  (a)  The  image  reconstructed  by  the 
PSM  has  TV  =  919  and  was  obtained  after  2217  seconds,  (b)  The  image  reconstructed  by  the 
SM  has  TV  =  873  and  was  obtained  after  102  seconds. 
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TV  value 

Time  (seconds) 

PSM 

919 

2217 

SM 

873 

102 

Table  1:  Performance  comparison  of  PSM  and  SM  when  producing  the  reconstructions  in 
Figure  2. 


output.  The  TV  of  the  SM  output  was  876,  which  is  also  less  than  that  of  the  output 
of  PSM.  The  SM  output  is  shown  in  Figure  2(b). 

As  summarized  in  Table  1,  with  the  stopping  rule  that  guarantees  that  the  output 
of  the  SM  is  at  least  as  constraints-compatible  as  the  output  of  the  PSM,  the  SM 
showed  clearly  superior  efficacy  to  the  PSM:  it  obtained  a  result  with  a  lower  TV  value 
at  less  than  one  twentieth  of  the  computational  cost. 


6  Conclusions 

The  superiorization  methodology  (SM)  allows  the  conversion  of  a  feasibility-seeking 
algorithm,  designed  to  find  an  ^-compatible  solution  of  the  constraints,  into  a  superi- 
orized  algorithm  that  inserts,  into  the  feasibility-seeking  algorithm,  objective  function 
reduction  steps  without  ruining  the  guaranteed  feasibility-seeking  nature  of  the  algo¬ 
rithm.  The  superiorized  algorithm  interlaces  objective  function  nonascent  steps  into  the 
original  process  in  an  automatic  manner.  In  case  of  strong  perturbation  resilience  of 
the  original  feasibility-seeking  algorithm,  mathematical  results  indicate  why  the  supe¬ 
riorized  algorithm  will  be  efficacious  for  producing  an  an  £-conrpatible  solution  output 
with  a  low  value  of  the  objective  function. 

We  have  presented  an  example  for  which  the  SM  finds  a  better  solution  to  a  con¬ 
strained  minimization  problems  than  the  projected  subgradient  method  (PSM),  and 
in  significantly  less  computation  time.  This  finding  is  understandable  in  view  of  the 
nature  of  how  the  methods  interlace  feasibility  oriented  activities  with  optimization  ac¬ 
tivities.  While  the  PSM  requires  a  projection  onto  the  feasible  region  of  the  constrained 
minimization  problem,  the  SM  needs  to  do  only  projections  onto  the  individual  con¬ 
straints  whose  intersection  is  the  feasible  region.  We  demonstrated  this  experimentally 
on  a  large-sized  application  that  is  modeled,  and  needs  to  be  solved,  as  a  constrained 
minimization  problem. 
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